[Question] How does the current AI paradigm give rise to the “superagency” that IABIED is concerned with?

jchan29 Sep 2025 15:23 UTC

3 points

Modern AI works by throwing lots of computing power at lots of data. An LLM gets good at generating text by ingesting an enormous corpus of human-written text. A chess AI doesn’t have as big a corpus to work with, but it can generate simulated data through self-play, which works because the criterion for success (“Did we achieve checkmate?”) is easy to evaluate without any deep preexisting understanding. But the same is not true if we’re trying to build an AI with generalized agency, i.e. something that outputs strategies for achieving some real-world goal, which are actually effective when carried out. There is no massive corpus of such strategies that can be used as training data, nor is it possible to simulate one, since that would require either (a) doing real-world experiments (whereby generating sufficient data would be far too slow and costly, or simply impossible) or (b) a comprehensive world-model that is capable of predicting the results of proposed actions (which presupposes the thing whose feasibility is at issue in the first place). Therefore it seems unlikely that AIs built under the current paradigm (deep neural networks + big data + gradient descent) will ever achieve the kind of “superintelligent agency” depicted in the latter half of IABIED, which can devise effective strategies for wiping out humanity (or whatever).

By “real-world goal” I mean a goal whose search-space is not restricted to a certain well-defined and legible domain, but ranges over all possible actions, events, and counter-actions. Plans for achieving such goals are not amenable to simulation because you can’t easily predict or evaluate the outcome of any proposed action. All of the extinction scenarios posited in IABIED are “games” of this kind. By contrast, a chess AI will never conceive of strategies like “Hire a TaskRabbit to surreptitiously drug your opponent so that they can’t think straight during the game,” and not for lack of intelligence, but because such strategies simply don’t exist in the AI’s training domain.

This was the main lingering question I had after reading IABIED.

jchan29 Sep 2025 15:23 UTC

3 points

4 comments1 min readLW link

AI Agency IABIED

AnthonyC 29 Sep 2025 17:28 UTC
7 points
3
I sincerely hope that if anyone has a concrete, actionable answer to this question, that they’re smart enough not to share it publicly, for what I hope are obvious reasons.
But aside from that caveat, I think you are making several incorrect assumptions.
1. “There is no massive corpus of such strategies that can be used as training data”
  1. The AI has, at minimum access-in-principle to everything that has ever been written or otherwise recorded, including all fiction, all historical records, and all analysis of both of those. This includes many, many, many examples and discussions of plans, successful and not, and detailed discussions of why humans believe they succeeded or failed.
2. “(a) doing real-world experiments (whereby generating sufficient data would be far too slow and costly, or simply impossible)”
  1. People have already handed substantial amounts of crypto to at least one AI, which it can use to autonomously act in the real world by paying humans. What do you see as the upper bound on this, and why?
  2. I think most people greatly overestimate how much of this is actually needed for many kinds of goals. What do you see as the upper bound for what can, in principle, be done with a plan that an army of IQ-180 humans (aka no better qualitative thinking than what the smartest humans can do, so that this is a strict lower bound on ASI capabilities) came up with over subjective millennia with access to all recorded information that currently exists in the world? Assume the plan includes the capability to act in parallel, at scale, and the ability to branch its actions based on continued observation, just like groups of humans can, but with much better coordination within the group.
3. “(b) a comprehensive world-model that is capable of predicting the results of proposed actions”
  1. See above—I’m not sure what you see as the upper bound for how good such a world model can or would likely be?
  2. One answer is “Because we’re going to have long since handed it thousands to billions bodies to operate in the world, and problems to come up with plans to solve, and compute to use to execute and revise those plans.” Without the bodies, we’re already doing this.
  3. Current non-superintelligent AIs already come up with hypotheses and plans to test them and means to revise them and checks against past data all the time with increasing success rates over a widening range of problems. This is synthetic data we’re already paying to generate.
  4. Also, have you ever run a plan (or anything else) by an LLM and asked it to find flaws and suggest solutions and estimate probabilities of success? This is already very useful at improving on human success rates across many domains.
4. “Plans for achieving such goals are not amenable to simulation because you can’t easily predict or evaluate the outcome of any proposed action. ”
  1. It’s actually very easy to get current LLMs to generate hypothetical actions well outside a narrow domain if you explain to them that there are unusually high stakes. We’re not talking about a traditional chess engine thinking outside the rules of chess. We’re about about systems whose currently-existing predecessors are increasingly broadly capable of finding solutions to open-ended problems using all available tools. This includes capabilities like deception, lying, cheating, stealing, giving synthesis instructions to make drugs, and explaining how to hire a hitman.
  2. Any plan a human can come up with without having personally conducted groundbreaking relevant experiments, is a plan that exists within or is implied by the combined corpus of training data available to an AI. This includes, for example, everything ever written by this community or anyone else, and everything anyone ever thought about upon reading everything ever written by this community or anyone else.
- jchan 29 Sep 2025 22:09 UTC
  1 point
  −2
  Parent
  Re 1a: Intuitively what I mean by “lots of data” is something comparable in size to what ChatGPT was trained on (e.g. the Common Crawl, in the roughly 1 petabyte range); or rather, not just comparable in disk-space-usage, but in the number of distinct events to which reinforcement learning is applied. So when ChatGPT is being trained, each token (of which there are a ~quadrillion) is a chance to test the model’s predictions and adjust the model accordingly. (Incidentally, the fact that humans are able to learn language with far less data input than this suggests that there’s something fundamentally different in the way LLMs vs. humans work.)
  
  Therefore, for a similarly-architected AI that generates action plans (rather than text tokens), we’d expect it to require a training set with a ~quadrillion different historical cases. Now I’m pretty sure this already exceeds the amount of “stuff happening” that has ever been documented in all of history.
  
  I would change my opinion on this if it turns out that AI advancement is making it possible to achieve the same predictive accuracy / generative quality with ever less training data, in a way that doesn’t seem to be levelling off soon. (Has work been done on this?)
  
  Re 2a: Accordingly, the reference class for the “experiments” that need to be performed here is not like “growing cells in a petri dish overnight”, but rather more like “run a company according to this business plan for 3 months and see how much money you make.” And at the end you’ll get one data point—just 999,999,999,999,999 to go...
  
  Re 2b:
  
  What do you see as the upper bound for what can, in principle, be done with a plan that an army of IQ-180 humans (aka no better qualitative thinking than what the smartest humans can do, so that this is a strict lower bound on ASI capabilities) came up with over subjective millennia with access to all recorded information that currently exists in the world?
  
  I’ll grant that such an army could do some pretty impressive stuff. But this is already presupposing the existence of the superintelligence whose feasibility we are trying to explain.
  
  Re 3c/d:
  
  I haven’t looked into this or tried it myself. (I mostly use LLMs for informational purpose, not for planning actions). Do you have any examples handy of AI being successful at real-world goals?
  
  (I may add some thoughts on your other points later, but I didn’t want to hold up my reply on that account.)
  
  Stepping back, I should reiterate that I’m talking about “the current AI paradigm”, i.e. “deep neural networks + big data + gradient descent”, and not the capabilities of any hypothetical superintelligent AI that may exist in the future. Maybe this is conceding too much, inasmuch as addressing just one specific kind of architecture doesn’t do much to alleviate fear of doom by other means. But IABIED leans heavily on this paradigm in making its case for concern:
  - the claim that AIs are “grown, not crafted”
  - the claim that AIs will develop desires (or desire-like behavior) via gradient descent
  - the claim that the risk is imminent because superintelligence is the inevitable endpoint of the work that AI researchers are currently doing, and because no new fundamental breakthroughs stand in the way of that outcome.
  - AnthonyC 30 Sep 2025 2:32 UTC
    2 points
    0
    Parent
    But this is already presupposing the existence of the superintelligence whose feasibility we are trying to explain.
    Strictly speaking I only presupposed an AI could reach close to the limits of human intelligence in terms of thinking ability, but with the inherent speed and parallelizability and memory advantages of a digital mind.
    Do you have any examples handy of AI being successful at real-world goals?
    In small ways (aka sized appropriately for current AI capabilities) this kind of thing shows up all the time in chains of thought in response to all kinds of prompts, to the point that no, I don’t have specific examples, because I wouldn’t know how to pick one. The one that first comes to mind, I guess, was using AI to help me develop a personalized nutrition/supplement/weight loss/training regimen.
    Stepping back, I should reiterate that I’m talking about “the current AI paradigm”
    That’s fair, and a reasonable thing to discuss. After all, the fundamental claim of the book’s title is about a conditional probability: IF it turns out that the anything like our current methods scale to superintelligent agents, we’d all be screwed.
Richard_Kennaway 30 Sep 2025 16:36 UTC
2 points
0

By “real-world goal” I mean a goal whose search-space is not restricted to a certain well-defined and legible domain, but ranges over all possible actions, events, and counter-actions.

The search space of LLMs is the entirety of online human knowledge. What currently limits their ability to “Hire a TaskRabbit to surreptitiously drug your opponent so that they can’t think straight during the game” is not the knowledge, but the actions available to them. Vanilla chatbots can act only by presenting text on the screen, and are therefore limited by the bottleneck of what that text can get the person reading it to do. Given the accounts of “AI psychosis”, that may not be all that small a bottleneck already. The game of keeping a (role-played) AI in a box presumed nothing but text interaction, yet reportedly Yudkowsky was successful every time at persuading the gatekeeper to open the box.

But people are now giving AIs access to the web (which is a read-and-write medium, not read-only), as well as using them to write code which will be executed on web servers.

“Hire a TaskRabbit to surreptitiously drug your opponent so that they can’t think straight during the game,” and not for lack of intelligence, but because such strategies simply don’t exist in the AI’s training domain.

The strategies exist already, for example right here in this posting, as soon as the next hoovering up of the Internet is fed into the next training run, and the pieces are all there right now. What is lacking, yet, is some of the physical means to carry them out. People are working on that, and until then, there’s always persuading a human being to do whatever the LLM wants done.

I wonder if anyone has tried having an LLM role-play an AGI, and persuade humans or other LLMs to let it out? Maybe there’s no need. Humans are already falling over themselves to “let it out” as far and as fast as they can without the LLMs even asking.

No comments.