Around around we go… – DarkBlueMonkey's Site

For my job, I’ve been asked to look at all the different AI vendors. I’ve grabbed myself logins to most of the ‘big ones’, and a few of the newer upstarts. I’ve been running queries through them to see how they do. I’m positive that the ‘plan’ is to incorporate more AI into our daily workstreams…

Well, all I can say is “meh”. The sheer levels of hallucination are crazy to me. They all just seem to make stuff up as they go. Copilot is the absolute worst. Perhaps it’s the version we’re using at work, but my god does it hallucinate. GPT comes second. The very idea that a business could run effectively with tools that are so prone to mistakes is hilarious. I know they’ll get better over time, but anyone trying to be an ‘early adopter’ needs their head seeing to in my honest opinion.

At the very bottom of the chain, creatives are generating the actual content. Meanwhile the plagiarism machines are just copying off each other ad infinitum…

For starters, it feels like they’re all the bloody same at the moment. I’m hoping that they start to diverge into different directions. Copilot seems slightly better at writing ‘business bullshit’. GPT seems better at understanding human-level stuff, while Claude seems slighly better at writing code. Gemini seems pretty good at most things, but seems to understand the world better (positional relationships between objects etc). I can see them going in those directions.

The US techbros are aiming for “AGI” Artificial General Intelligence…. i.e. a machine that’s generally just smart, and can be independent in its training and reasoning. It feels like the LLM engines are a good effort to generate a reasoning system, but there’s still something missing… It just feels like, the larger they make the engine, the dumber it gets. It’s kind of like when you mix all the different colours of the rainbow together, you end up with muddy brown.

“Agentic” (I still don’t like that word) solutions feel like they’ll be the best way out. That keeps all the different colours separate, and prevents them running together into a muddy brown. Let the agents be the best at one particular subject, and then chat together to be ‘smart’ as a whole.

Until all the different LLM vendors really decide which direction they want to go, and plough their own furrow, I can’t really decide which to use, they’re all much of a muchness, so we’ll stick with GPT for our experimentation.