I sincerely hope that if anyone has a concrete, actionable answer to this question, that they’re smart enough not to share it publicly, for what I hope are obvious reasons.
But aside from that caveat, I think you are making several incorrect assumptions.
“There is no massive corpus of such strategies that can be used as training data”
The AI has, at minimum access-in-principle to everything that has ever been written or otherwise recorded, including all fiction, all historical records, and all analysis of both of those. This includes many, many, many examples and discussions of plans, successful and not, and detailed discussions of why humans believe they succeeded or failed.
“(a) doing real-world experiments (whereby generating sufficient data would be far too slow and costly, or simply impossible)”
People have already handed substantial amounts of crypto to at least one AI, which it can use to autonomously act in the real world by paying humans. What do you see as the upper bound on this, and why?
I think most people greatly overestimate how much of this is actually needed for many kinds of goals. What do you see as the upper bound for what can, in principle, be done with a plan that an army of IQ-180 humans (aka no better qualitative thinking than what the smartest humans can do, so that this is a strict lower bound on ASI capabilities) came up with over subjective millennia with access to all recorded information that currently exists in the world? Assume the plan includes the capability to act in parallel, at scale, and the ability to branch its actions based on continued observation, just like groups of humans can, but with much better coordination within the group.
“(b) a comprehensive world-model that is capable of predicting the results of proposed actions”
See above—I’m not sure what you see as the upper bound for how good such a world model can or would likely be?
One answer is “Because we’re going to have long since handed it thousands to billions bodies to operate in the world, and problems to come up with plans to solve, and compute to use to execute and revise those plans.” Without the bodies, we’re already doing this.
Current non-superintelligent AIs already come up with hypotheses and plans to test them and means to revise them and checks against past data all the time with increasing success rates over a widening range of problems. This is synthetic data we’re already paying to generate.
Also, have you ever run a plan (or anything else) by an LLM and asked it to find flaws and suggest solutions and estimate probabilities of success? This is already very useful at improving on human success rates across many domains.
“Plans for achieving such goals are not amenable to simulation because you can’t easily predict or evaluate the outcome of any proposed action. ”
It’s actually very easy to get current LLMs to generate hypothetical actions well outside a narrow domain if you explain to them that there are unusually high stakes. We’re not talking about a traditional chess engine thinking outside the rules of chess. We’re about about systems whose currently-existing predecessors are increasingly broadly capable of finding solutions to open-ended problems using all available tools. This includes capabilities like deception, lying, cheating, stealing, giving synthesis instructions to make drugs, and explaining how to hire a hitman.
Any plan a human can come up with without having personally conducted groundbreaking relevant experiments, is a plan that exists within or is implied by the combined corpus of training data available to an AI. This includes, for example, everything ever written by this community or anyone else, and everything anyone ever thought about upon reading everything ever written by this community or anyone else.
Strictly speaking I only presupposed an AI could reach close to the limits of human intelligence in terms of thinking ability, but with the inherent speed and parallelizability and memory advantages of a digital mind.
In small ways (aka sized appropriately for current AI capabilities) this kind of thing shows up all the time in chains of thought in response to all kinds of prompts, to the point that no, I don’t have specific examples, because I wouldn’t know how to pick one. The one that first comes to mind, I guess, was using AI to help me develop a personalized nutrition/supplement/weight loss/training regimen.
That’s fair, and a reasonable thing to discuss. After all, the fundamental claim of the book’s title is about a conditional probability: IF it turns out that the anything like our current methods scale to superintelligent agents, we’d all be screwed.