otto.barten comments on Foom & Doom 2: Technical alignment is hard

otto.barten 25 Aug 2025 16:53 UTC
1 point
0
Thanks a lot for writing, I think these are two fascinating pieces. I know I’m late, but couldn’t resist still writing out some thoughts.
I tend to agree that a new architecture will be existentially more relevant than LLMs (medium probability over a few decades combined with high risk on occurrence), and that this could still foom and doom (although I’m not at all certain about any of this: my p(doom) is still ~10%).
I also have some thoughts about what I think you’re saying:
- Society can do way more than you seem to think to make sure ASI (more precisely: takeover-level AI) does not get built. It seems that you’re combining “ASI is going to be insanely more powerful than not just LLMs, but than any AI anyone can currently imagine” with “we will only have today’s policy levers to pull”. I think that’s only realistic if this AI will get built very suddenly and without an increase in xrisk awareness. Xrisk awareness could increase because of relatively gradual development or because of comms such as If anyone builds it, everyone dies, comms by profs, podcasts/social media comms, activism, etc. etc. It’s definitely not certain neither of these two things (comms or gradual development) will happen. If one of them does, I think a better model would be “ASI is going to be insanely more powerful than anyone can currently imagine, and therefore we’re going to have access to insanely more regulatory options than we currently do.” In such a world, I’d be reasonably optimistic about regulating even low-flop ASI (and no, not via mass surveillance, probably).
- Although you do gesture at this, I think you’re underappreciating the importance of what some have called goalcrafting (and I think many technical AI safety researchers underestimate this). I’m somewhat relieved that you include goals such as a Long Reflection and handing over world power to a process (I’d unironically suggest a slow, boring, very democratic, woke, anti-progress UN committee). But of course there are a million ways in which we can end up with a bad universe if the wrong people/processes/things get the power. I think of the wrong people not as baddies/black hats/psychopaths, but as humans who try to do their best but have their weaknesses, both cognitively and morally, as we all do, are a bit power-seeking, and are for these reasons not up to the task of determining the future of the universe. I think none of us is. Therefore, I think if we want to have any chance of survival in a world where we develop unipolar ASI, we need to prioritize goalcrafting as much as technical alignment work. We should involve society in this, not just AI safety and tech people.
- I think the researchers inventing the ASI will possibly not realize the power of what they’ve invented until they’ll see a clear demonstration. Therefore, they won’t work on alignment until then, they won’t think of goalcrafting, and they won’t attempt a pivotal act even if they would have worked on alignment. If we want either of those things, we’ll have to somehow force researchers to do this or bring in others (e.g. by regulation).
- Reasons why I think unregulated ASI researchers themselves will probably not commit a pivotal act include:
  - They will probably not be highly xrisk-aware. They will probably not have thought about AI safety in significant detail.
  - Even if they would have, committing a pivotal act will be an extremely high-risk thing to do.
  - Committing a pivotal act would be illegal. If it would somehow go wrong or be misunderstood (which seems likely), they would probably be criminally prosecuted.
  - No one would expect them to commit a pivotal act or blame them for not doing it.
- I could imagine a gov’t, after the ASI’s power is somehow demonstrated to them, trying to block all other sources of ASI, both abroad (first in adversary countries) and at home (as public order). Using their intent-aligned ASI, they could perhaps achieve this. That’s kind of a pivotal act. The downside, of course, is that the regulating gov’t will have absolute power for eternity.
In general, I think we should think a lot better about realistic ways to make technically-aligned unipolar ASI go well. Currently, we don’t have any. I think we really need society at large for this. Apart from that, I agree that we should try working on technical alignment of non-LLM ASI as you suggest.
- Steven Byrnes 2 Sep 2025 19:43 UTC
  3 points
  0
  Parent
  Thanks!
  Society can do way more than you seem to think to make sure ASI (more precisely: takeover-level AI) does not get built. It seems that you’re combining “ASI is going to be insanely more powerful than not just LLMs, but than any AI anyone can currently imagine” with “we will only have today’s policy levers to pull”. I think that’s only realistic if this AI will get built very suddenly and without an increase in xrisk awareness. Xrisk awareness could increase because of relatively gradual development or because of comms… In such a world, I’d be reasonably optimistic about regulating even low-flop ASI (and no, not via mass surveillance, probably).
  I’m not sure what policy levers you have in mind; if you’re being coy in public you can also DM me.
  If you’re thinking about regulating basic research, or regulating anything that could plausibly be branded as basic research, I would note that pandemics are pretty salient after COVID (one would assume), but IIUC it is currently and always has been legal to do gain-of-function research. (The current battle-front in the GoF wars IIUC is whether the government should fund it. Actually outlawing it would be very much harder. Outlawing it internationally would be harder still.) As another example, there was an international treaty against bioweapons but the USSR secretly made bioweapons anyway.
  In the USA, climate change is a huge cause that is widely (if incorrectly) regarded as existential or nearly so (56% agree with “humanity is doomed”, 55% with “the things I value most will be destroyed”, source) but carbon taxes remain deeply unpopular, Trump is gratuitously canceling green energy projects, and even before Trump, green energy projects were subject to stifling regulations like environmental reviews.
  Another thing: Suppose I announced in 2003: “By the time we can make AI that can pass the Turing Test, nail PhD-level exams in every field, display superhuman persuasion, find zero-days, etc., by THEN clearly lots of crazy new options will be in the Overton window”. …I think that would have sounded (to my 2003 audience) like a very sensible proclamation, and my listeners would have all agreed that this is obviously true.
  But here we are. All those things have happened. But people are still generally treating AI as a normal technology, getting inured to each new AI accomplishment within days of it happening, or being oblivious, or lying, or saying and believing whatever nonsense most suits them, etc. And thus there’s still far more regulation on selling a sandwich than on training the world’s most powerful AI. (To be clear, I think people are generally correct to treat LLMs-in-particular as a normal technology, but I think they’re correct by coincidence.)
  …Anyway, all this is a bit besides the point. “Whether we’re doomed or not” is fun to argue about, but less decision-relevant than “what should we do now”, and it seems that you and I are in agreement with you that comms, x-risk awareness, and gradual development are all generally good, on present margins.
  Although you do gesture at this, I think you’re underappreciating the importance of what some have called goalcrafting (and I think many technical AI safety researchers underestimate this).
  Yes, this is an extra reason that we’re even more doomed :-P
  We need goals that are both (1) technically achievable to install in an AGI and (2) good for the world / future. I tend not to expect that we’ll do so great on technical alignment that we can focus on (2) without feeling very constrained by (1); rather I expect that (1) will only offer a limited option space (and of course I think where we’re at right now is that (1) is the empty set). But I guess we’ll see.
  If we make so much progress on (1) that we can type in anything whatsoever in a text box and that’s definitely the AGIs goals, then I guess I’d vote for Eliezer’s poetic CEV thing. Of course, the people with access to the text box may type something different instead, but that’s a problem regardless.
  If we don’t make that much progress on (1), then goalcrafting becomes entangled with technical alignment, right?
  Hmm, thinking about it more, I agree that it would be nice to build our general understanding of (2) in parallel with work on (1). E.g. can we do more to operationalize long reflection, or archipeligo, or CEV, or nanny AI, etc.? Not sure how to productively “involve society” at this stage (what did you have in mind?), beyond my general feeling that very widely spreading the news that ASI could actually happen, and what that would really mean, is a very good thing.
  - otto.barten 3 Sep 2025 14:33 UTC
    3 points
    0
    Parent
    Thanks for the comment! I agree with a lot of what you’re saying.
    Regarding the policy levers: we’re doing research into that right now. I hope to be able to share a first writeup mid October. Shall I email it to you once it’s there? Would really appreciate your feedback!
    I agree that pandemic and climate policies have been a mess. In general though I think the argument “A has gone wrong, therefore B will go wrong” is not watertight. A better version of the argument would be statistical rather than anecdotal: “90% of policies have gone wrong, therefore we give 90% probability to this policy also going wrong.” I think though that 1) less than 90% of govt policies have generally gone wrong, and 2) even if there were only 10% chance of policies successfully reducing xrisk, that still seems worth a try.
    I think people are generally correct to treat LLMs-in-particular as a normal technology, but I think they’re correct by coincidence.
    Agree, although I’m agnostic on whether LLMs or paradigms building upon them will actually lead to takeover-level AI. So people might still be consequentially wrong rather than coincidentally correct.
    it seems that you and I are in agreement with you that comms, x-risk awareness, and gradual development are all generally good, on present margins.
    Thank you, good to establish.
    I agree that goals we could implement would be limited by the state of technical alignment, but as you say, I don’t see a reason to not work on them in parallel. I’m not convinced one is necessarily much harder or easier than the other. The whole thing just seems such a pre-paradigmatic mess that anything seems possible and work on a defensible bet without significant downside risk seems generally good. Goalcrafting seems a significant part of the puzzle that has received comparatively little attention (small contribution). The four options you mention could be interesting to work out further, but of course there’s a zillion other possibilities. I don’t think there’s even a good taxonomy right now..?
    I agree that involving society was poorly defined, but what I have in mind would at least include increasing our comms efforts about AI’s risks (including but not limited to extinction). Hopefully this increases input that non-xriskers can give. Political scientists seem relevant, historians, philosophers, social scientists. Artists should make art about possible scenarios. I think there should be a public debate about what alignment should mean exactly.
    I don’t think anyone of us (or even our bubble combined) is wise enough to decide the future of the universe unilaterally. We need to ask people: if we end up with this alignable ASI, what would you want it to do? What dangers do you see?
    - Steven Byrnes 3 Sep 2025 15:20 UTC
      3 points
      1
      Parent
      I agree with pretty much everything you wrote.
      Anecdote: I recall feeling a bit “meh” when I heard about the Foresight AI Pathways thing and the FLI Worldbuilding Contest thing.
      But when I think about it more, I guess I’m happy that they’re doing those things.
      Hmm, I’m trying to remember why I initially felt dismissive. I guess I expected that the resulting essays would be implausible or incoherent, and that nobody would pay attention anyway, and thus it wouldn’t really move the needle in the big picture. (I think those expectations were reasonable, and those are still my expectations. [I haven’t read the essays in enough detail to confirm.]) Maybe my feelings are more like frustration than dismissiveness—frustration that progress is so hard. Again, yeah, I guess it’s good that people are trying that kind of thing.
      - otto.barten 4 Sep 2025 9:35 UTC
        7 points
        0
        Parent
        Thanks, yeah, tbh I also felt dismissive about those projects. I’m one of the perhaps few people in this space who never liked scifi, and those projects felt like scifi exercises to me. Scifi feels a bit plastic to me, cheap, thin on the details, might as well be completely off. (I’m probably insulting people here, sorry about that, I’m sure there is great scifi. I guess these projects were also good, all considered.)
        
        But if it’s real, rather than scifi, the future and its absurdities suddenly become very interesting. Maybe we should write papers with exploratory engineering and error bars rather than stories on a blog? I did like the work of Anders Sandberg for example.
        
        What we want the future to be like, and not be like, necessarily has a large ethical component. I also have to say that ethics originating from the xrisk space, such as longtermism, tends to defend very non-mainstream ideas that I tend not to agree with. Longtermism has mostly been critiqued for its ASI claims, its messengers, and its lack of discounting factors, but I think the real controversial parts are its symmetric population ethics (leading to a necessity to quickly colonize the lightcone which I don’t necessarily share) and its debatable decision to count AI as valued population, too (leading to wanting to replace humanity with AI for efficiency reasons).
        
        I disagree with these ideas, so ethically, I’d trust a kind of informed public average more than many xriskers. I’d be more excited about papers trying their best to map possible futures, and using mainstream ethics (and fields like political science, sociology, psychology, art and aesthetics, economics, etc.) to 1) map and avoid ways to go extinct, 2) map and avoid major dystopias, and 3) try to aim for actually good futures.