A simple analogy for why the “using LLMs to control LLMs” approach is flawed:
It’s like training 10 mice to control 7 chinchillas, who will control 4 mongooses, controlling three raccoons, which will reign in one tiger.
A lot has to go right for this to work, and you better hope that there aren’t any capability jumps akin to raccoons controlling tigers.
I just wanted to release this analogy out into the wild to be picked up by any public/political-facing people to pick up if useful for persuasion.
This analogy falters a bit if you consider the research proposals that use advanced AI to police itself (a.k.a., tigers controlling tigers). I hope we can scale robust versions of that.