Discussion about this post

User's avatar
Jonno's avatar

I've followed your takes on AI development with interest for a while, but this piece is just another level. Easily the best analytical piece within the broad "AI future" field I've read in a long time. This is the kind of healthy AI skepticism we need more of - and there are still many more avenues to explore. Kudos!

Simon Kinahan's avatar

I suspect the answer to AI unreliability is more AI. LLMs mimic humans, and humans are unreliable. How do we manage human unreliability? More humans.

The software industry is (maybe for the first time ever) a model here. Almost always when software fails it is because a human made a mistake. The software industry is built around processes of testing and review that screen errors down to an acceptable (but far from zero) level, and then react to address further errors as they appear. Almost the entire software lifecycle, except the tiny fun part at the start, is about addressing human cognitive failure.

Within that process there are multiple different human roles, but each of those roles can be formalized as encoding informal language into code or decoding code into informal language, so each of those roles can, potentially, be played by an LLM. Will they be as good as humans? Definitely not as good as the best humans. But will they be acceptable? Probably. And they're much faster, so to some extent you can replace ability with iteration speed. And if you have different models with only limited shared context writing specs, writing code, writing and running tests, doing hands on testing, filing bugs and reviewing code, and you do it faster than humans would be able to do it, is the net result as reliable as a human result? I don't see why it shouldn't be.

I have a suspicion though that the cost of inference is still too high though. Where we stand right now we could easily double the fully loaded cost of a developer if we let them use as many tokens as they might want to automate their whole workflow.

2 more comments...

No posts

Ready for more?