AnthonyC comments on How might we safely pass the buck to AI?

AnthonyC 20 Feb 2025 12:40 UTC
7 points
0
I expect that we will probably end up doing something like this, whether it is workable in practice or not, if for no other reason than it seems to be the most common plan anyone in a position to actually implement any plan at all seems to have devised and publicized. I appreciate seeing it laid out in so much detail.
By analogy, it certainly rhymes with the way I use LLMs to answer fuzzy complex questions now. I have a conversation with o3-mini to get all the key background I can into the context window, have it write a prompt to pass the conversation onto o1-pro, repeat until I have o1-pro write a prompt for Deep Research, and then answer Deep Research’s clarifying questions before giving it the go ahead. It definitely works better for me than trying to write the Deep Research prompt directly. But, part of the reason it works better is that at each step, the next-higher-capabilities model comes back to ask clarifying questions I hadn’t noticed were unspecified variables, and which the previous model also hadn’t noticed were unspecified variables. In fact, if I take the same prompt and give it to Deep Research multiple times in different chats, it will come back with somewhat different sets of clarifying questions—it isn’t actually set up to track down all the unknown variables it can identify. This reinforces that even for fairly straightforward fuzzy complex questions, there are a lot of unstated assumptions.
If Deep Research can’t look at the full previous context and correctly guess what I intended, then it is not plausible that o1-pro or o3-mini could have done so. I have in fact tested this, and the previous models either respond that they don’t know the answer, or give an answer that’s better than chance but not consistently correct. Now, I get that you’re talking about future models and systems with higher capability levels generally, but adding more steps to the chain doesn’t actually fix this problem. If any given link can’t anticipate the questions and correctly intuit the answer about what the value of the unspecified variables should be—what the answers to the clarifying questions should be—then the plan fails, because the previous model will be worse at this. If it can, then it does not need to ask the previous model in the chain. The final model will either get it right on its own, or else end up with incorrect answers to some of the questions about what it’s trying to achieve. It may ask anyway, if the previous models are more compute efficient and still add information. But it doesn’t strictly need them.
And unfortunately, keeping the human in the loop also doesn’t solve this. We very often don’t know what we actually want well enough to correctly answer every clarifying question a high-capabilities model could pose. And if we have a set of intervening models approximating and abstracting the real-but-too-hard question into something a human can think about, well, that’s a lot of translation steps where some information is lost. I’ve played that game of telephone among humans often enough to know it only rarely works (“You’re not socially empowered to go to the Board with this, but if you put this figure with this title phrased this way in this conversation and give it to your boss with these notes to present to his boss, it’ll percolate up through the remaining layers of management”).
Is there a capability level where the first model can look at its full corpus of data on humanity and figure out the answers to the clarifying questions from the second model correctly? I expect so. The path to get that model is the one you drew a big red X through in the first figure, for being the harder path. I’m sure there are ways less-capable-than-AGI systems can help us build that model, but I don’t think you’ve told us what they are.