Noosphere89 comments on Foom & Doom 1: “Brain in a box in a basement”

Noosphere89 6 Jul 2025 18:20 UTC
8 points
0
I think the most important crux around takeoff speeds discussions, other than how fast AI can get smarter without more compute, is how much we should expect superintelligence to be meaningfully hindered by logistics issues by default.
In particular, assuming the existence of nanotech as Drexler envisioned would mostly eliminate the need for long supply chains, and would allow forces to be supplied entirely locally through a modern version of living off the land.
This is related to prior takeoff speeds discussions, as even if we assume the existence of technology that mostly eliminates logistical issues, it might be too difficult to develop in a short enough time to actually matter for safety-relevance.
I actually contend that a significant (though not all) of the probability of doom from AI risk fundamentally relies on the assumption that superintelligence can fundamentally trivialize the logistics cost of doing things, especially on actions which require long supply lines like war, because if we don’t assume this, then takeover is quite a lot harder and has much more cost for the AI, meaning stuff like AI control/dealmaking has a much higher probability of working, because the AI can’t immediately strike on it’s own, and needs to do real work on acquiring physical stuff like getting more GPUs, or more robots for an AI rebellion.
Indeed, I think the essential assumption of AI control is that AIs can’t trivialize away logistics costs by developing tech like nanotech, because this means their only hope of actually getting real power is by doing a rogue insider deployment, because currently we don’t have the resources outside of AI companies to actually support a superintelligent AI outside the lab, due to interconnect issues.
I think a core disagreement with people that are much more doomy than me, like Steven Byrnes or Eliezer Yudkowsky and Nate Soares, is probably due to me thinking that conditional on such tech existing, I think it almost certainly requires way more time and experimentation than stuff like AlphaZero/AlphaGo or game-playing AI tends to imply (I think the game playing AIs had fundamental advantages like access to ground-truth reward that could be unlimitedly mined for data, and importantly there is ~0 cost to failure of experimentation, which is very different to most other fields, where there’s a real cost for failed experiments, and we don’t have unlimited data, forcing us to use more compute or algorithms), and that’s if such tech exists, which is more arguable than Drexler implies.
There is of course other considerations like @ryan_greenblatt’s point that even if a new paradigm is required, it’s likely to be continuous with LLMs because it’s possible to mix in imitation learning and continuous learning/memory, such that even if imitation learning doesn’t lead to AGI on it’s own, LLMs will still be a part of how such AGI is constructed, and I agree with a few quotes below by Ryan Greenblatt on this:
Prior to having a complete version of this much more powerful AI paradigm, you’ll first have a weaker version of this paradigm (e.g. you haven’t figured out the most efficient way to do the brain algorithmic etc). Further, the weaker version of this paradigm might initially be used in combination with LLMs (or other techniques) such that it (somewhat continuously) integrates into the old trends. Of course, large paradigm shifts might cause things to proceed substantially faster or bend the trend, but not necessarily.
Further, we should still broadly expect this new paradigm will itself take a reasonable amount of time to transition through the human range and though different levels of usefulness even if it’s very different from LLM-like approaches (or other AI tech). And we should expect this probably happens at massive computational scale where it will first be viable given some level of algorithmic progress (though this depends on the relative difficulty of scaling things up versus improving the algorithms). As in, more than a year prior to the point where you can train a superintelligence on a gaming GPU, I expect someone will train a system which can automate big chunks of AI R&D using a much bigger cluster.

(Next quote) Also, I think the question is “can you somehow make use of imitation data” not “can the brain learning algorithm immediately use of imitation”?
The implicit assumption that logistics is trivial for superintelligence I think bleeds into a lot of LW thinking around AI, and a lot of AI disagreements basically turn on how far AIs can make logistics easier than current human supply chains.
What links here?
- Noosphere89's comment on Thane Ruthenis’s Shortform by Thane Ruthenis (13 Jul 2025 0:03 UTC; 3 points)
- Steven Byrnes 7 Jul 2025 15:19 UTC
  4 points
  2
  Parent
  Right, there’s a possible position which is: “I’ll accept for the sake of argument your claim there will be an egregiously misaligned ASI requiring very little compute (maybe ≲1 chip per human equivalent including continuous online learning), emerging into a world not terribly different from today’s. But even if so, that’s OK! While the ASI will be a much faster learner than humans, it will not magically know things that it has no way to have figured out (§1.8.1), and that includes developing nanotechnology. So it will be reliant on humans and human infrastructure during a gradual process.”
  Or something like that?
  Anyway, if so, yeah I disagree, even if I grant (for the sake of argument) that exotic nanotech does not exist.
  I’m not an ASI and haven’t thought very hard about it, so my strategies might be suboptimal, but for example it seems to me that an ASI could quite rapidly (days or weeks not months) earn or steal tons of money, and hack into basically every computer system in the world (even APT groups are generally unable to avoid getting hacked by other APT groups!), and then the AI (which now exists in a zillion copies around the world) can get people around the world to do whatever it wants via hiring them, bribing them, persuading them, threatening them, tricking them, etc.
  And what does it get the people to do? Mainly “don’t allow other ASIs to be built” and “do build and release novel pandemics”. The latter should be pretty quick—making pandemics is worryingly easy IIUC (see Kevin Esvelt). If infrastructure and the electric grid starts going down, fine, the AI can rebuild, as long as it has at least one solar-cell-connected chip and a teleoperated robot that can build more robots and scavenge more chips and solar panels (see here), and realistically it will have many of those spread all around.
  (See also Carl Shulman on AI takeover.)
  There are other possibilities too, but hopefully that’s suggestive of “AI doom doesn’t require zero-shot designs of nanotech” (except insofar as viruses are arguably nanotech).
  Oh, I guess we also disagree RE “currently we don’t have the resources outside of AI companies to actually support a superintelligent AI outside the lab, due to interconnect issues”. I expect future ASI to be much more compute-efficient. Actually, even frontier LLMs are extraordinarily expensive to train, but if we’re talking about inference rather than training, the requirements are not so stringent I think, and people keep working on it.
  - Noosphere89 7 Jul 2025 19:30 UTC
    3 points
    0
    Parent
    Right, there’s a possible position which is: “I’ll accept for the sake of argument your claim there will be an egregiously misaligned ASI requiring very little compute (maybe ≲1 chip per human equivalent including continuous online learning), emerging into a world not terribly different from today’s. But even if so, that’s OK! While the ASI will be a much faster learner than humans, it will not magically know things that it has no way to have figured out (§1.8.1), and that includes developing nanotechnology. So it will be reliant on humans and human infrastructure during a gradual process.”
    Basically this, and in particular I’m willing to grant the premise that for the sake of argument there is technology that eliminates the need for most logistics, but that all such technology will take at least a year or more of real-world experimentation that means that the AI can’t immediately take over.
    On this:
    I’m not an ASI and haven’t thought very hard about it, so my strategies might be suboptimal, but for example it seems to me that an ASI could quite rapidly (days or weeks not months) earn or steal tons of money, and hack into basically every computer system in the world (even APT groups are generally unable to avoid getting hacked by other APT groups!), and then the AI (which now exists in a zillion copies around the world) can get people around the world to do whatever it wants via hiring them, bribing them, persuading them, threatening them, tricking them, etc.
    And what does it get the people to do? Mainly “don’t allow other ASIs to be built” and “do build and release novel pandemics”. The latter should be pretty quick—making pandemics is worryingly easy IIUC (see Kevin Esvelt). If infrastructure and the electric grid starts going down, fine, the AI can rebuild, as long as it has at least one solar-cell-connected chip and a teleoperated robot that can build more robots and scavenge more chips and solar panels (see here), and realistically it will have many of those spread all around.
    I think the entire crux is that all of those robots/solar cell chips you referenced currently depend on human industry/modern civilization to actually work, and they’d quickly degrade and become non-functional on the order of weeks or months if modern civilization didn’t exist, and this is arguably somewhat inevitable due to economics (until you can have tech that obviates the need for long supply chains).
    And in particular, in most takeover scenarios where AIs don’t automate the economy first, I don’t expect AIs to be able to keep producing robots for a very long time, and I’d bump it up to 300-3,000 years at minimum because there is less easily accessible resources combined with AIs being much less capable due to having very little compute relative to modern civilization.
    In particular, I think that disrupting modern civilization to a degree such that humans are disempowered (assuming no tech that obviates the need for logistics) pretty much as a consequence breaks the industries/logistics needed to fuel further AI growth, because there’s no more trade, which utterly fucks up modern economies.
    And your references argue that human civilization wouldn’t go extinct very soon because of civilizational collapse, and that AIs can hack existing human industry to help them, and I do think this is correct (modulo the issue that defense is easier than offense for the cybersecurity realm specifically, and importantly, a key reason for this is that once you catch the AI doing it, there are major consequences for AIs and humans, which actually matter for AI safety):
    https://x.com/MaxNadeau_/status/1912568930079781015
    The important thing is that this paper rocks and I’d love to see bigger/better versions of it, but I’ll use it as a jumping-off point for a take: The headline result is that their best defenses work 93% of the time. Is that anything? If you’ve been reading Simon Willison (whose blog I think is great!), you might say “that’s useless, because in application security 99% is a failing grade” (https://simonwillison.net/2025/Apr/11/camel/#the-best-part-is-it-doesn-t-use-more-ai). And no wonder, because the techniques in this paper are just variations of “solve AI security problems with more AI”, which Simon has emphasized are a dead end (https://simonwillison.net/2022/Sep/17/prompt-injection-more-ai/). So why are the Redwood folks rosier about 93%? The crux here is whether your threat model assumes attackers can try over and over again. Simon is assuming they can, in which case 93% would be useless, but Redwood folks are assuming that there are major costs to the attacker trying and failing (https://lesswrong.com/posts/i2nmBfCXnadeGmhzW/catching-ais-red-handed) This assumption is not at all unique to misalignment threats. It’s the same assumption that Debenedetti, Carlini, and Tramer make here (https://arxiv.org/abs/2306.02895), which they call evasion “without breaking eggs”. I think the right way to model a variety of security problems, e.g. insider threats from employees. One’s assumptions about the costs of attacker failure have huge implications for which security measures look helpful, and I think this is an important factor to mentally track in conversations about these topics.
    (It talks about a control technique but the discussion easily transfers outside of the example).
    I actually agree cyber-attacks to subvert human industry are a threat and are worth keeping in mind, but none of your references support the idea that AIs can keep going without modern civilization’s logistics, and I think people vastly underestimate how necessary modern civilization logistics are to support industry, and how fragile they are to even somewhat minor disruptions, let alone the disruptions that would follow after takeover (assuming it doesn’t already have sufficient resources to be self-sustaining).
    There are other possibilities too, but hopefully that’s suggestive of “AI doom doesn’t require zero-shot designs of nanotech” (except insofar as viruses are arguably nanotech).
    I agree with this, but fairly critically I do think it actually matters quite a lot for AI strategy purposes if we don’t assume AIs can quickly rebuild stuff/can obviate logistics through future tech quickly, and it matters pretty greatly to a lot of people’s stories of doom, even if AIs can doom us through just trying to hijack modern civilization and then wait for humans to automate themselves away and then once humans have been fully cut out of the loop and AIs can self-sustain an economy without us, bioweapons are used to attack humans, it matters that we have time.
    This makes AI control protocols for example a lot more effective, because we can assume that independent AIs outside of the central servers of stuff like Deepmind won’t be able to affect things much.
    Oh, I guess we also disagree RE “currently we don’t have the resources outside of AI companies to actually support a superintelligent AI outside the lab, due to interconnect issues”. I expect future ASI to be much more compute-efficient. Actually, even frontier LLMs are extraordinarily expensive to train, but if we’re talking about inference rather than training, the requirements are not so stringent I think, and people keep working on it.
    I actually do expect future AIs to be more compute efficient, but I think that at the point where superintelligent AIs can support themselves purely based off of stuff like personal computers, all control of the situation is lost and either the AIs are aligned and grant us a benevolent personal utopia, or they’re misaligned and we are extinct/mostly dead.
    So the limits of computational/data efficiency being very large don’t matter much for the immediate situation on AI risk.
    The point of no return happens earlier than this, and the reason is that even in a future where imitation learning/LLMs do not go all the way to AGI in practice and must have something more brain-like like continuous learning and long-term memories, that imitation learning continues to be useful and will be used by AIs, and there’s a very important difference between imitation learning alone not scaling all the way to AGI and imitation learning not being useful at all, and I think LLMs provide good evidence that imitation is surprisingly useful even if it doesn’t scale to AGI.
    I think a general worldview clash is that I tend to think technological change is mostly driven by early prototypes that at first are pretty inefficient, and there require many changes to get the system to become more efficient, and while there are thresholds of usefulness for the AI case, change operates more continuously than people think.
    Finally, we have good reason to believe that the human range is actually pretty large, such that AIs do take a noticeable amount of time from being human level to being outright superintelligent:
    There could also be another reason for why non-imitation-learning approaches could spend a long while in the human range. Namely: Perhaps the human range is just pretty large, and so it takes a lot of gas to traverse. I think this is somewhat supported by the empirical evidence, see this AI impacts page (discussed in this SSC).
    - Steven Byrnes 7 Jul 2025 20:43 UTC
      3 points
      0
      Parent
      I think the entire crux is that all of those robots/solar cell chips you referenced currently depend on human industry/modern civilization to actually work, and they’d quickly degrade and become non-functional on the order of weeks or months if modern civilization didn’t exist, and this is arguably somewhat inevitable due to economics (until you can have tech that obviates the need for long supply chains).
      OK, imagine (for simplicity) that all humans on Earth drop dead simultaneously, but there’s a John-von-Neumann-level AI on a chip connected to a solar panel with two teleoperated robots. Every time they scavenge another chip and solar cell, there becomes another human-level AI copy. Every time a robot builds another teleoperated robot from scavenged parts, there’s that too. What exactly is going to break in “weeks or months”? Solar cells can work for 30 years, no problem. GPUs are also reported to last for decades. (Note that, as long as GPUs are a non-renewable resource, the AI would presumably take extremely good care of them, keeping them dust-free, cooling them well below the nominal temperature spec, etc.) The AI can find decent GPUs in every house on the street, and I think hundreds of millions more by breaking into big data centers. Similar for solar panels. If one robot breaks, another robot can repair it. Janky teleoperated robots without fingers made by students for $20K can vacuum, make coffee, cook a meal, etc. Competent human engineers can make pretty impressive mechanical hands using widely-available parts. I grant that it would take a long while before the growing AI clone army could run a semiconductor supply chain by itself, but it has all the time in the world. I expect it to succeed, and thus to sustain itself into the indefinite future, and I’m confused why you don’t. (Or maybe you do and I’m misunderstanding.)
      BTW I also think that a minimal semiconductor supply chain would be very very much simpler than the actual semiconductor supply chain that exists in our human world, which has been relentlessly optimized for cost, not simplicity. For example, EBL (e-beam lithography) has better resolution than EUV and is a zillion times easier to build, but the human economy would never support building out km²-scale warehouses full of millions of EBL machines to compensate for their crappy throughput. But for an AI bootstrapping its way back up, why not?
      (I’m continuing to assume no weird nanotech for the sake of argument, but I will point out that, since brains exist, it follows that it is possible to grow self-assembling brain-like computing devices (in vats, tended by robots), using only widely-available raw materials like plants and oxygen.)
      I’m confused about other parts of your comment as well. Joseph Stalin was able to use his (non-superhuman) intelligence and charisma to wind up in dictatorial control of Russia. What’s your argument that an AI could not similarly wind up with dictatorial control over humans? Don’t the same arguments apply? “If we catch the AI trying to gain power in bad ways, we’ll shut it down.” “If we catch Stalin trying to gain power in bad ways, we’ll throw him in jail.” But the latter didn’t happen. What’s the disanalogy, from your perspective?
      - Noosphere89 10 Jul 2025 17:43 UTC
        4 points
        0
        Parent
        OK, imagine (for simplicity) that all humans on Earth drop dead simultaneously, but there’s a John-von-Neumann-level AI on a chip connected to a solar panel with two teleoperated robots. Every time they scavenge another chip and solar cell, there becomes another human-level AI copy. Every time a robot builds another teleoperated robot from scavenged parts, there’s that too. What exactly is going to break in “weeks or months”? Solar cells can work for 30 years, no problem. GPUs are also reported to last for decades. (Note that, as long as GPUs are a non-renewable resource, the AI would presumably take extremely good care of them, keeping them dust-free, cooling them well below the nominal temperature spec, etc.) The AI can find decent GPUs in every house on the street, and I think hundreds of millions more by breaking into big data centers. Similar for solar panels. If one robot breaks, another robot can repair it. Janky teleoperated robots without fingers made by students for $20K can vacuum, make coffee, cook a meal, etc. Competent human engineers can make pretty impressive mechanical hands using widely-available parts. I grant that it would take a long while before the growing AI clone army could run a semiconductor supply chain by itself, but it has all the time in the world. I expect it to succeed, and thus to sustain itself into the indefinite future, and I’m confused why you don’t. (Or maybe you do and I’m misunderstanding.)
        BTW I also think that a minimal semiconductor supply chain would be very very much simpler than the actual semiconductor supply chain that exists in our human world, which has been relentlessly optimized for cost, not simplicity. For example, EBL (e-beam lithography) has better resolution than EUV and is a zillion times easier to build, but the human economy would never support building out km²-scale warehouses full of millions of EBL machines to compensate for their crappy throughput. But for an AI bootstrapping its way back up, why not?
        The key trouble is all the power generators that sustain the AI would break within weeks or months, and the issue is even if they could build GPUs, they’d have no power to run them within at most 2 weeks:
        https://www.reddit.com/r/ZombieSurvivalTactics/comments/s6augo/comment/ht4iqej/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
        https://www.reddit.com/r/explainlikeimfive/comments/klupbw/comment/ghb0fer/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
        Realistically, we are looking at power grid collapses within days.
        And without power, none of the other building projects could work, because they’d stop receiving energy, and importantly this means the AI is on a tight timer, and some of this is partially due to expectations that the first transformative useful AI will use more compute than you project, even conditional on a different paradigm being introduced like brain-like AGIs, but another part of my view is that this is just one of many examples where humans need to constantly maintain stuff in order for the stuff to work, and if we don’t assume tech that can just solve logistics is available within say 1 year, it will take time for AIs to actually survive without humans, and this time is almost certainly closer to months or years than weeks or days.
        The hard part of AI takeover isn’t killing all humans, it’s in automating enough of the economy (including developing tech like nanotech) such that the humans stop mattering, and while AIs can do this, it takes actual time, and that time is really valuable in fast moving scenarios.
        I’m confused about other parts of your comment as well. Joseph Stalin was able to use his (non-superhuman) intelligence and charisma to wind up in dictatorial control of Russia. What’s your argument that an AI could not similarly wind up with dictatorial control over humans? Don’t the same arguments apply? “If we catch the AI trying to gain power in bad ways, we’ll shut it down.” “If we catch Stalin trying to gain power in bad ways, we’ll throw him in jail.” But the latter didn’t happen. What’s the disanalogy, from your perspective?
        I didn’t say AIs can’t take over, and I very critically did not say that AI takeover can’t happen in the long run.
        I only said AI takeover isn’t trivial if we don’t assume logistics are solvable.
        But to deal with the Stalin example, the answer for how he took over was basically that he was willing to wait a long time, and in particular he used both persuasion and the fact that he already had a significant amount of power by having the General Secretary, and his takeover was basically by allying with loyalists and in particular strategically breaking alliances that he had made, and violence was used later on to show that no one was safe from him.
        Which is actually how I expect successful AI takeover to happen in practice, if it does happen.
        Very importantly, Stalin didn’t need to create an entire civilization out of nothing, or nearly nothing, and other people like Trotsky handled the logistics, though the takeover situation was far more preferable to the communist party as they both had popular support and didn’t have as long supply lines as the opposition forces like the Whites did, and they had a preexisting base of industry that was much easier to seize than modern industries.
        This applies to most coups/transitions of power in that most of the successful coups aren’t battles between factions, but rather one group managing to make itself the new Schelling point over other groups.
        @Richard_Ngo explains more below:
        https://www.lesswrong.com/posts/d4armqGcbPywR3Ptc/power-lies-trembling-a-three-book-review#The_revolutionary_s_handbook
        Most of my commentary in the last comment is either arguing that things can be made more continuous and slow than your story depicts, or arguing that your references don’t support what you claimed, and I did say that the cyberattack story is plausible, just that it didn’t support the idea that AIs could entirely replace civilization without automating away us first, which takes time.
        This doesn’t show AI doom can’t happen, but it does matter for the probability estimates of many LWers on here, because it’s a hidden background assumption disagreement that underlies a lot of other disagreements.
        Steven Byrnes 18 Jul 2025 2:09 UTC
        4 points
        0
        Parent
        I wrote:
        OK, imagine (for simplicity) that all humans on Earth drop dead simultaneously, but there’s a John-von-Neumann-level AI on a chip connected to a solar panel with two teleoperated robots. Every time they scavenge another chip and solar cell, there becomes another human-level AI copy. Every time a robot builds another teleoperated robot from scavenged parts, there’s that too. What exactly is going to break in “weeks or months”?
        Then your response included:
        The key trouble is all the power generators that sustain the AI would break within weeks or months, and the issue is even if they could build GPUs, they’d have no power to run them within at most 2 weeks…
        I included solar panels in my story precisely so that there would be no need for an electric grid. Right?
        I grant that powering a chip off a solar panel is not completely trivial. For example, where I live, residential solar cells are wired in such a way that they shut down when the grid goes down (ironically). But, while it’s not completely trivial to power a chip off a solar cell, it’s also not that hard. I believe that a skilled and resourceful human electrical engineer would be able to jury-rig a solution to that problem without much difficulty, using widely-available parts, like the electronics already attached to the solar panel, plus car batteries, wires, etc. Therefore our hypothetical “John-von-Neumann-level AI with a teleoperated robot” should be able to solve that problem too. Right?
        (Or were you responding to something else? I’m not saying “all humans on Earth drop dead simultaneously” is necessarily realistic, I’m just trying to narrow down where we disagree.)
        Noosphere89 20 Jul 2025 15:59 UTC
        10 points
        0
        Parent
        I did not realize you were assuming that the AI was powered solely by solar power that isn’t connected to the grid.
        
        Given your assumption, I agree that AGI can rebuild supply chains from scratch, albeit paiinfully and slowly, so I agree that AGI is an existential threat assuming it isn’t aligned.
        
        I was addressing a different scenario because I didn’t read the part of your comment where you said the AI is independent of the grid.
- Random Developer 20 Jul 2025 17:52 UTC
  1 point
  0
  Parent
  Your points are excellent, and without near-magical nanotech, I suspect they rule out most of the fastest “foom” scenarios. But I don’t think it matters that much in the long run.
  
  A hostile ASI (without nanotech) would need, at a minimum, robot mines and robot factories. Which means it would need human buy-in for long enough to automate the economy. Which means that the AI needs the approval and the assistance of humans.
  
  But humans are really easy to manipulate:
  - Powerful humans want more power or more wealth. Promise them that and they’ll sell out the rest of humanity in a heartbeat.
  - Corporations want good numbers, and they’ll do whatever it takes to make the quarterly earnings look good.
  - Humans are incredibly susceptible to propaganda, and they will happily cause horrific and long-lasting damage to their futures because of what they saw on TV or Facebook.
  - Any new AI tool will immediately be given all the power and control it can handle, and probably some that it can’t.
  Also, LLMs are very good actors; they can imitate any role in the training set. So the net result is that the AI will act cooperative, and it will make a bunch of promises to powerful people and the public. And we’ll ultimately hand control over, because we’ll be addicted to high quality intelligence for a couple of dollars an hour.
  
  Once the AI can credibly promise wealth, leisure, and advanced medical technology, we’ll give it more and more control.
  - Noosphere89 21 Jul 2025 17:35 UTC
    3 points
    0
    Parent
    On this:
    Your points are excellent, and without near-magical nanotech, I suspect they rule out most of the fastest “foom” scenarios.
    Technical flag: I’m only claiming that near-magical nanotech won’t be developed in the time period that matters here, not claiming that it’s impossible to do.
    But I don’t think it matters that much in the long run.
    I partially disagree with this, and the reason for this is because I believe that buying time matters a lot in the singularity with automated AI alignment, so it really matters whether we would be doomed in 1-10 years or 1-12 months or 1-4 weeks.
    And importantly if we assume that the AI is dependent on it’s power/data centers early on, this absolutely makes AI control schemes much more viable than otherwise, because AIs don’t want to escape out of the box, but rather subvert it.
    This also buys us a slower takeoff than otherwise, which is going to be necessary for muddling through to work.
    That said, it could well be difficult to persuade at least some selected people, at least without great BCI/nanotech.
    But yeah, this is one of the reasons why I’m still worried about AI takeover, and I absolutely agree with these points:
    Powerful humans want more power or more wealth. Promise them that and they’ll sell out the rest of humanity in a heartbeat.
    Corporations want good numbers, and they’ll do whatever it takes to make the quarterly earnings look good.
    Any new AI tool will immediately be given all the power and control it can handle, and probably some that it can’t. (At least by default)
    I’d argue this is an instrumental goal for all AIs, not just LLMs, but this is closer to a nitpick:
    Also, LLMs are very good actors; they can imitate any role in the training set. So the net result is that the AI will act cooperative, and it will make a bunch of promises to powerful people and the public. And we’ll ultimately hand control over, because we’ll be addicted to high quality intelligence for a couple of dollars an hour.
    Once the AI can credibly promise wealth, leisure, and advanced medical technology, we’ll give it more and more control.
    - Random Developer 22 Jul 2025 18:02 UTC
      1 point
      0
      Parent
      
      Technical flag: I’m only claiming that near-magical nanotech won’t be developed in the time period that matters here, not claiming that it’s impossible to do.
      
      I think there are several potentially relevant categories of nanotech:
      
      Drexlerian diamond phase nanotech. By Drexler’s own calculations, I recall that this would involve building systems with 10^15 atoms and very low error rates. Last I looked, this whole approach has been stuck at error rates above 80% per atom since the 90s. At least one expert with domain expertise argues that “machine phase” nanotech is likely a dead end, in Soft Machines. Summary: Liquid-phase self-assembly using Brownian motion is stupidly effective at this scale.
      Non-trivial synthetic biology. If you buy either the existence proof of natural biology or the argument in Soft Machines, this road should still be open to an ASI. And maybe some descendant of AlphaFold could make this work! But it’s not clear that it offers an easy route to building enormous quantities of GPU-equivalents. Natural selection of single-cell organisms is fast, massively parallel, and ongoing for billions of years.
      Engineered plagues. This probably is within reach of even humans, given enough resources and effort. A virus with a delayed mortality rate similar to MERS with the transmissibility of post-Omicron strains of SARS-COV-2 might very well be a “recipe for ruin” that’s within reach of multiple nation-states. But critically, this wouldn’t allow an ASI to build GPUs unless it already had robot mines and factories, and the ability to defend them from human retaliation.
      
      So yeah, if you want to get precise, I don’t want to rule out (2) in the long run. But (2) is likely difficult, and it’s probably much more likely than (1).
      
      I partially disagree with this, and the reason for this is because I believe that buying time matters a lot in the singularity with automated AI alignment, so it really matters whether we would be doomed in 1-10 years or 1-12 months or 1-4 weeks.
      
      If I strip my argument of all the details, it basically comes down to: “In the long run, superior intelligence and especially cheap superior intelligence wins the ability to make the important decisions.” Or some other versions I’ve heard:
      
      “Improved technology, including the early steam engine, almost always created more and better jobs for horses. Right up until we had almost fully general replacements for horses.”
      “Hey, I haven’t seen Homo erectus around lately.”
      
      This isn’t an argument about specific pathways to a loss of control. Rather, it’s an argument that tireless, copyable, Nobel-prize-winner-level general intelligence which costs less than minimum wage has massive advantages (both economically and in terms of natural selection). In my case, it’s also an argument based on a strong suspicion that alignment of ASI cannot be guaranteed in the long term.
      
      Basically, I see only three viable scenarios which turn out well:
      
      AI fizzle. This would be nice, but I’m not counting on it.
      A massive, terrifying incident leading to world-wide treaties against AI, backed up by military force. E.g., “Joint Chinese-US strike forces will bomb your data centers as hard as necessary to shut them down, and the UN and worldwide public will agree you had it coming.”
      We ultimately lose control to the AI, but we get lucky, and the AI likes us enough to keep us as pets. We might be able to bias an inevitable loss of control in this direction, with luck. Call this the “Culture scenario.”
      
      Buying time probably helps in scenarios (2) and (3), either because you have a larger window for attempted ASI takeover to fail spectacularly, or because you have more time to bias an inevitable loss of control towards a “humans as well-loved pets” scenario.
      
      (I really need to write up a long-form argument of why I fear that long-term, guaranteed ASI alignment is not a real thing, except in the sense of “initially biasing ASI to be more benevolent pet owners.)
      What links here?
      Rana Dexsin's comment on LessWrong Feed [new, now in beta] by Ruby (23 Jul 2025 7:55 UTC; 3 points)
      - Noosphere89 22 Jul 2025 19:18 UTC
        2 points
        0
        Parent
        Even more technical details incoming:
        In response to this:
        I think there are several potentially relevant categories of nanotech:
        Drexlerian diamond phase nanotech. By Drexler’s own calculations, I recall that this would involve building systems with 10^15 atoms and very low error rates. Last I looked, this whole approach has been stuck at error rates above 80% per atom since the 90s. At least one expert with domain expertise argues that “machine phase” nanotech is likely a dead end, in Soft Machines. Summary: Liquid-phase self-assembly using Brownian motion is stupidly effective at this scale.
        Non-trivial synthetic biology. If you buy either the existence proof of natural biology or the argument in Soft Machines, this road should still be open to an ASI. And maybe some descendant of AlphaFold could make this work! But it’s not clear that it offers an easy route to building enormous quantities of GPU-equivalents. Natural selection of single-cell organisms is fast, massively parallel, and ongoing for billions of years.
        Engineered plagues. This probably is within reach of even humans, given enough resources and effort. A virus with a delayed mortality rate similar to MERS with the transmissibility of post-Omicron strains of SARS-COV-2 might very well be a “recipe for ruin” that’s within reach of multiple nation-states. But critically, this wouldn’t allow an ASI to build GPUs unless it already had robot mines and factories, and the ability to defend them from human retaliation.
        So yeah, if you want to get precise, I don’t want to rule out (2) in the long run. But (2) is likely difficult, and it’s probably much more likely than (1).
        I’m going to pass on the question of whether Drexlerian diamond phase nanotech is possible, because there are way too many competing explanations of what happened to nanotech in the 90s and verifying whether Drexlerian nanotech is possible is not worthwhile enough, because I think the non-trivial synthetic biology path is probably enough to mostly replicate the dream of nanotech.
        My reasons here come down to the fact that I think that natural selection turned out to miss the potential for reversible computation, and while reversible computers still must pay a minimum energy cost, this is far, far less than irreversible computers must pay, and a fairly important part of my thinking is that for whatever reason, natural selection just didn’t make life intrinsically perform reversible over irreversible computation, meaning that an AI could exploit this to save energy, and my other reason is that reversible computers can do all computational stuff normal computers do, and this is an area where I just disagree with @jacob_cannell the last time we talked about this.
        Paper below to prove my point (PDF is available):
        Logical reversibility of computation
        https://www.semanticscholar.org/paper/Logical-reversibility-of-computation-Bennett/4c7671550671deba9ec318d867522897f20e19ba
        And pretty importantly, this alone can get you a lot of OOMs, and I estimated we could get about 15 OOM energy savings just by moving from the Landauer limit to the Margolus-Levitin limit, and this is enough to let you explore far, far more design space than what nature has done so far:
        https://www.lesswrong.com/posts/pFaLjmyjBKPdbptPr/does-biology-reliably-find-the-global-maximum-or-at-least#e9ji2ZLy4Aq92RmuN
        The universal (at least until we get better physical models) bound on computation is in this paper, which you might like reading:
        A Universal Constraint on Computational Rates in Physical Systems
        https://arxiv.org/abs/2208.11196
        So my general intuition is that the fact that we can drastically lower energy expenditure is enough to make a lot of synthetic life design proposals much more viable than they would be otherwise, and that probably includes most of the specific nanotech examples Drexler proposed.
        That said, I agree that this can be made difficult, especially if we apply AI control.
        Now, on to the important meat of the discussion.
        On this:
        
        If I strip my argument of all the details, it basically comes down to: “In the long run, superior intelligence and especially cheap superior intelligence wins the ability to make the important decisions.” Or some other versions I’ve heard:
        “Improved technology, including the early steam engine, almost always created more and better jobs for horses. Right up until we had almost fully general replacements for horses.”
        “Hey, I haven’t seen Homo erectus around lately.”
        
        This isn’t an argument about specific pathways to a loss of control. Rather, it’s an argument that tireless, copyable, Nobel-prize-winner-level general intelligence which costs less than minimum wage has massive advantages (both economically and in terms of natural selection).
        I think this argument is correct until this part:
        In my case, it’s also an argument based on a strong suspicion that alignment of ASI cannot be guaranteed in the long term.
        I think this is actually not true, and I think in the long-term, it’s certainly possible to value-align an ASI, though I agree that in the short term, we will absolutely not be confident that our alignment techniques worked.
        (I really need to write up a long-form argument of why I fear that long-term, guaranteed ASI alignment is not a real thing, except in the sense of “initially biasing ASI to be more benevolent pet owners.)
        I do agree that even in good scenarios, it’s very likely the relationship between baseline humans and ASI will look a lot more like the human-pet relationship/benevolent mythological god/angel-human relationships in fiction than any other relationship, it’s just that I do count it as an alignment success if we can get this sort of outcome, because the only thing propping up the outcome is value alignment, and if AIs were as selfish as say most billionaires, far worse outcomes from AI takeover result.
        And the role of the AI control agenda is in large part about making AI alignment safe to automate, which is why time matters here.
        I do agree that something like AI takeover, in either positive or negative directions is very likely inevitable assuming continued AI progress.
        Random Developer 23 Jul 2025 3:49 UTC
        1 point
        0
        Parent
        I agree that reversible computation would be a very, very big deal. Has anyone proposed any kind of remotely plausible physical substrate that doesn’t get laughed out of the room by competent researchers in materials science and/or biochemistry? I haven’t seen anything, but I haven’t been looking in this area, either.
        
        There are a few other possible computational game changers. For example, if you could get 200 to 500 superimposed qubits with error correction, you could likely do much more detailed simulations of exotic chemistry. And that, in turn, would give you lots of things that might get you closer to “factories in a box.”
        
        So I can’t rule out an ASI finding some path to compact self-replication from raw materials. Biology did it once that we know of, after all. It’s more that (1) worlds in which an ASI can figure this out easily are probably doomed, and (2) I suspect that convincing humans to allow robot mines and factories is easier and quicker.
        
        I think this is actually not true, and I think in the long-term, it’s certainly possible to value-align an ASI, though I agree that in the short term, we will absolutely not be confident that our alignment techniques worked.
        
        Unfortunately, I’ve never really figured out how to explain why I suspect robust alignment is impossible. The problem is that too many of my intuitions on this topic come from:
        
        Working with Lisp developers who were near the heart of the big 80s AI boom. They were terrifyingly capable people, and they made a heroic effort to make “rule-based” systems work. They failed, and they failed in a way that convinced most of them that they were going down the wrong path.
        Living through the 90s transition to statistical and probabilistic methods, which quickly outstripped what came before. (We could also have some dimensionality reduction, as a treat.)
        Spending too much time programming robots, which is always a brutal lesson in humility. This tends to shatter a lot of naive illusions how AI might work.
        
        So rather than make an ironclad argument, I’m going to wave vaguely in the direction of my argument, in hope that you might have the right referents to independently recognize what I’m waving at. In a nutshell:
        
        The world is complex, and you need to work to interpret it. (What appears in this video? Does the noisy proximity sensor tell us we’re near a wall?)
        The output of any intelligent system is basically a probability distribution (or ranking) over the most likely answers. (I think the video shows a house cat, but it’s blurry and hard to tell. I think we’re within 4 centimeters of a wall, with an 80% probability of falling within 3-5 centimeters. I think the Roomba is in the living room, but there’s a 20% chance we’re still in the kitchen.
        The absolute minimum viable mapping between the hard-to-interpret inputs and the weighted output candidates is a giant, inscrutable matrix with a bunch of non-linearities thrown in. This is where all the hard-earned intuitions I mentioned above come in. In nearly all interesting cases, there is no simpler form.
        
        And on top of this, “human values” are extremely poorly defined. We can’t specify what we want, and we don’t actually agree. (For a minority of humanity, “hurting the outgroup” is a fairly major value. For another very large minority, “making everyone submit to the authority I follow” is absolutely a value. See the research on “authoritarian followers” for more.)
        
        So the problem boils down to ambiguous inputs, vague and self-contradictory policies, and probabilistic outputs. And the glue holding all this together is a multi-billion parameter matrix with some non-linearities thrown in just for fun. And just in case that wasn’t fun enough, and realistic system will also need to (1) learn from experience, and (2) design secessor systems.
        
        Even if you can somehow exert reasonable influence over the vaules of a system, the system will learn from experience, and it will spend a lot of its time far outside any training distribution. And eventually it will need to design a new system.
        
        Fundamentally, once such a system is built, it will end up marking its own decisions. Maybe, if we’re lucky, we can bias it towards values we like and get a “benevolent pet owner” scenario. But a throusand years from now, the AIs will inevitably be making all the big decisions.
        Noosphere89 23 Jul 2025 19:12 UTC
        2 points
        0
        Parent
        My thoughts on this:
        I agree that reversible computation would be a very, very big deal. Has anyone proposed any kind of remotely plausible physical substrate that doesn’t get laughed out of the room by competent researchers in materials science and/or biochemistry? I haven’t seen anything, but I haven’t been looking in this area, either.
        There are a few other possible computational game changers. For example, if you could get 200 to 500 superimposed qubits with error correction, you could likely do much more detailed simulations of exotic chemistry. And that, in turn, would give you lots of things that might get you closer to “factories in a box.”
        So I can’t rule out an ASI finding some path to compact self-replication from raw materials. Biology did it once that we know of, after all. It’s more that (1) worlds in which an ASI can figure this out easily are probably doomed, and (2) I suspect that convincing humans to allow robot mines and factories is easier and quicker.
        The answer to the question for materials that enable more efficient reversible computers than conventional computers is that currently, they don’t exist, but I interpret the lack of materials so far not much evidence that very efficient reversible computers are impossible, and rather evidence that creating computers at all is unusually difficult compared to other domains, mostly because of the contingencies of how our supply chains are set up, combined with the fact that so far we haven’t had much demand for reversible computation, and unlike most materials that people want here we aren’t asking for a material that we know violates basic physical laws, which I suspect is the only reliable constraint on ASI in the long run.
        I think it’s pretty easy to make it quite difficult for the AI to easily figure out nanotech in the time-period that is relevant, so I don’t usually consider nanotech a big threat from AI takeover, and I think the competent researchers not finding any plausible materials so far is a much better signal of this will take real-world experimentation/very high-end simulation, meaning it’s pretty easy to stall for time, than it is a signal that such computers are impossible.
        I explicitly agree with these 2 points, for the record:
        It’s more that (1) worlds in which an ASI can figure this out easily are probably doomed, and (2) I suspect that convincing humans to allow robot mines and factories is easier and quicker.
        On this part:
        Unfortunately, I’ve never really figured out how to explain why I suspect robust alignment is impossible. The problem is that too many of my intuitions on this topic come from:
        Working with Lisp developers who were near the heart of the big 80s AI boom. They were terrifyingly capable people, and they made a heroic effort to make “rule-based” systems work. They failed, and they failed in a way that convinced most of them that they were going down the wrong path.
        Living through the 90s transition to statistical and probabilistic methods, which quickly outstripped what came before. (We could also have some dimensionality reduction, as a treat.)
        Spending too much time programming robots, which is always a brutal lesson in humility. This tends to shatter a lot of naive illusions how AI might work.
        So rather than make an ironclad argument, I’m going to wave vaguely in the direction of my argument, in hope that you might have the right referents to independently recognize what I’m waving at. In a nutshell:
        The world is complex, and you need to work to interpret it. (What appears in this video? Does the noisy proximity sensor tell us we’re near a wall?)
        The output of any intelligent system is basically a probability distribution (or ranking) over the most likely answers. (I think the video shows a house cat, but it’s blurry and hard to tell. I think we’re within 4 centimeters of a wall, with an 80% probability of falling within 3-5 centimeters. I think the Roomba is in the living room, but there’s a 20% chance we’re still in the kitchen.
        The absolute minimum viable mapping between the hard-to-interpret inputs and the weighted output candidates is a giant, inscrutable matrix with a bunch of non-linearities thrown in. This is where all the hard-earned intuitions I mentioned above come in. In nearly all interesting cases, there is no simpler form.
        And on top of this, “human values” are extremely poorly defined. We can’t specify what we want, and we don’t actually agree. (For a minority of humanity, “hurting the outgroup” is a fairly major value. For another very large minority, “making everyone submit to the authority I follow” is absolutely a value. See the research on “authoritarian followers” for more.)
        So the problem boils down to ambiguous inputs, vague and self-contradictory policies, and probabilistic outputs. And the glue holding all this together is a multi-billion parameter matrix with some non-linearities thrown in just for fun. And just in case that wasn’t fun enough, and realistic system will also need to (1) learn from experience, and (2) design secessor systems.
        Even if you can somehow exert reasonable influence over the vaules of a system, the system will learn from experience, and it will spend a lot of its time far outside any training distribution. And eventually it will need to design a new system.
        Fundamentally, once such a system is built, it will end up marking its own decisions. Maybe, if we’re lucky, we can bias it towards values we like and get a “benevolent pet owner” scenario. But a throusand years from now, the AIs will inevitably be making all the big decisions.
        So I have a couple of points to make in response.
        1 is that I think alignment progress is pretty disconnectable from interpretability progress, at least in the short term, and I think that a lot of the issues with rule based systems is that they expected complete interpretability at the first go.
        This is due to AI control.
        2 is that this is why the alignment problem is defined as the problem of how to get AIs that will do what the creator/developer/owner/user intends them to do, whether or not that thing is good or bad from other moral perspectives, and the goal is to make arbitrary goals be chosen without leading to perverse outcomes for the owner of AI systems.
        This means that if it’s aligned to 1 human at all, that counts as an alignment success for the purposes of the alignment problem.
        John Wentworth has a more complete explanation below:
        https://www.lesswrong.com/posts/dHNKtQ3vTBxTfTPxu/what-is-the-alignment-problem
        3 is I believe automating AI alignment is pretty valuable, and in the long run I don’t expect alignment to look like a list of rules, I expect it to look like AIs optimizing in the world for human thriving, and I don’t necessarily expect the definition to be anything compact, and that’s fine in my view.
        4 is that alignment doesn’t require the AI not taking over, and it’s fine if the AI takes over and makes us pets/we serve in Heaven, and in particular pointed out that it’s totally fine if the AIs make all the decisions, so long as they are near-perfect or perfectly aligned to the human, and in particular what I mean is that the human delegates all of the task to the AI, it’s just that the values are decided by the humans at the start of the AI explosion, even if they aren’t compact and the AI is entirely autonomous in working for the human after that.
        The best explanation of how value alignment is supposed to work comes from @Thane Ruthenis’s post below on how a utopia-maximizer would look like:
        https://www.lesswrong.com/posts/okkEaevbXCSusBoE2/how-would-an-utopia-maximizer-look-like
        (Edited due to a difficult to understand reaction by @Vladimir_Nesov, who can often have pretty confusing ideas to newcomers, so that was a strong signal my words weren’t clarifying enough.)
        (Edit 2: I changed goals to values, as I apparently didn’t clarify that goals in my ontology basically correspond to values/morals, and are terminal, not instrumental goals, and gave a link to clarify how value alignment might work).
        5 is that to the extent interpretability on AI works, I expect it to have the use case of not understanding everything, but rather intervening on AIs even when we don’t have labeled data.
        From Sam Marks:
        
        Rather, I think that most of the value lies in something more like “enabling oversight of cognition, despite not having data that isolates that cognition.” In more detail, I think that some settings have structural properties that make it very difficult to use data to isolate undesired aspects of model cognition. A prosaic example is spurious correlations, assuming that there’s something structural stopping you from just collecting more data that disambiguates the spurious cue from the intended one. Another example: It might be difficult to disambiguate the “tell the human what they think is the correct answer” mechanism from the “tell the human what I think is the correct answer” mechanism. I write about this sort of problem, and why I think interpretability might be able to address it, here. And AFAICT, I think it really is quite different—and more plausibly interp-advantaged—than “unknown unknowns”-type problems.
        To illustrate the difference concretely, consider the Bias in Bios task that we applied SHIFT to in Sparse Feature Circuits. Here, IMO the main impressive thing is not that interpretability is useful for discovering a spurious correlation. (I’m not sure that it is.) Rather, it’s that—once the spurious correlation is known—you can use interp to remove it even if you do not have access to labeled data isolating the gender concept. As far as I know, concept bottleneck networks (arguably another interp technique) are the only other technique that can operate under these assumptions.
        And I think this is very plausible even if your interpretability isn’t complete or nearly complete from an AI.
        But that’s my response to why I think aligning AI is possible at all.
        Vladimir_Nesov 23 Jul 2025 20:50 UTC
        2 points
        0
        Parent
        
        Edited due to a difficult to understand reaction
        
        It’s clearer now what you are saying, but I don’t see why you are attributing that point to me specifically (it’s mostly gesturing at value alignment as opposed to intent alignment).
        
        it’s fine if the AI takes over and makes us pets
        
        This sounds like permanent disempowerment. Intent alignment to bad decisions would certainly be a problem, but that doesn’t imply denying opportunity for unbounded growth, where in particular eventually decisions won’t have such issues.
        
        it’s just that the goal is decided by the human
        
        If goals are “decided”, then it’s not value alignment, and bad decisions lead to disasters.
        
        (Overall, this framing seems unhelpful when given in response to someone arguing that values are poorly defined.)

Noosphere89 comments on Foom & Doom 1: “Brain in a box in a basement”

Logical reversibility of computation

A Universal Constraint on Computational Rates in Physical Systems