Steven Byrnes comments on Foom & Doom 2: Technical alignment is hard

Steven Byrnes 2 Sep 2025 19:43 UTC
3 points
0
Thanks!
Society can do way more than you seem to think to make sure ASI (more precisely: takeover-level AI) does not get built. It seems that you’re combining “ASI is going to be insanely more powerful than not just LLMs, but than any AI anyone can currently imagine” with “we will only have today’s policy levers to pull”. I think that’s only realistic if this AI will get built very suddenly and without an increase in xrisk awareness. Xrisk awareness could increase because of relatively gradual development or because of comms… In such a world, I’d be reasonably optimistic about regulating even low-flop ASI (and no, not via mass surveillance, probably).
I’m not sure what policy levers you have in mind; if you’re being coy in public you can also DM me.
If you’re thinking about regulating basic research, or regulating anything that could plausibly be branded as basic research, I would note that pandemics are pretty salient after COVID (one would assume), but IIUC it is currently and always has been legal to do gain-of-function research. (The current battle-front in the GoF wars IIUC is whether the government should fund it. Actually outlawing it would be very much harder. Outlawing it internationally would be harder still.) As another example, there was an international treaty against bioweapons but the USSR secretly made bioweapons anyway.
In the USA, climate change is a huge cause that is widely (if incorrectly) regarded as existential or nearly so (56% agree with “humanity is doomed”, 55% with “the things I value most will be destroyed”, source) but carbon taxes remain deeply unpopular, Trump is gratuitously canceling green energy projects, and even before Trump, green energy projects were subject to stifling regulations like environmental reviews.
Another thing: Suppose I announced in 2003: “By the time we can make AI that can pass the Turing Test, nail PhD-level exams in every field, display superhuman persuasion, find zero-days, etc., by THEN clearly lots of crazy new options will be in the Overton window”. …I think that would have sounded (to my 2003 audience) like a very sensible proclamation, and my listeners would have all agreed that this is obviously true.
But here we are. All those things have happened. But people are still generally treating AI as a normal technology, getting inured to each new AI accomplishment within days of it happening, or being oblivious, or lying, or saying and believing whatever nonsense most suits them, etc. And thus there’s still far more regulation on selling a sandwich than on training the world’s most powerful AI. (To be clear, I think people are generally correct to treat LLMs-in-particular as a normal technology, but I think they’re correct by coincidence.)
…Anyway, all this is a bit besides the point. “Whether we’re doomed or not” is fun to argue about, but less decision-relevant than “what should we do now”, and it seems that you and I are in agreement with you that comms, x-risk awareness, and gradual development are all generally good, on present margins.
Although you do gesture at this, I think you’re underappreciating the importance of what some have called goalcrafting (and I think many technical AI safety researchers underestimate this).
Yes, this is an extra reason that we’re even more doomed :-P
We need goals that are both (1) technically achievable to install in an AGI and (2) good for the world / future. I tend not to expect that we’ll do so great on technical alignment that we can focus on (2) without feeling very constrained by (1); rather I expect that (1) will only offer a limited option space (and of course I think where we’re at right now is that (1) is the empty set). But I guess we’ll see.
If we make so much progress on (1) that we can type in anything whatsoever in a text box and that’s definitely the AGIs goals, then I guess I’d vote for Eliezer’s poetic CEV thing. Of course, the people with access to the text box may type something different instead, but that’s a problem regardless.
If we don’t make that much progress on (1), then goalcrafting becomes entangled with technical alignment, right?
Hmm, thinking about it more, I agree that it would be nice to build our general understanding of (2) in parallel with work on (1). E.g. can we do more to operationalize long reflection, or archipeligo, or CEV, or nanny AI, etc.? Not sure how to productively “involve society” at this stage (what did you have in mind?), beyond my general feeling that very widely spreading the news that ASI could actually happen, and what that would really mean, is a very good thing.
- otto.barten 3 Sep 2025 14:33 UTC
  3 points
  0
  Parent
  Thanks for the comment! I agree with a lot of what you’re saying.
  Regarding the policy levers: we’re doing research into that right now. I hope to be able to share a first writeup mid October. Shall I email it to you once it’s there? Would really appreciate your feedback!
  I agree that pandemic and climate policies have been a mess. In general though I think the argument “A has gone wrong, therefore B will go wrong” is not watertight. A better version of the argument would be statistical rather than anecdotal: “90% of policies have gone wrong, therefore we give 90% probability to this policy also going wrong.” I think though that 1) less than 90% of govt policies have generally gone wrong, and 2) even if there were only 10% chance of policies successfully reducing xrisk, that still seems worth a try.
  I think people are generally correct to treat LLMs-in-particular as a normal technology, but I think they’re correct by coincidence.
  Agree, although I’m agnostic on whether LLMs or paradigms building upon them will actually lead to takeover-level AI. So people might still be consequentially wrong rather than coincidentally correct.
  it seems that you and I are in agreement with you that comms, x-risk awareness, and gradual development are all generally good, on present margins.
  Thank you, good to establish.
  I agree that goals we could implement would be limited by the state of technical alignment, but as you say, I don’t see a reason to not work on them in parallel. I’m not convinced one is necessarily much harder or easier than the other. The whole thing just seems such a pre-paradigmatic mess that anything seems possible and work on a defensible bet without significant downside risk seems generally good. Goalcrafting seems a significant part of the puzzle that has received comparatively little attention (small contribution). The four options you mention could be interesting to work out further, but of course there’s a zillion other possibilities. I don’t think there’s even a good taxonomy right now..?
  I agree that involving society was poorly defined, but what I have in mind would at least include increasing our comms efforts about AI’s risks (including but not limited to extinction). Hopefully this increases input that non-xriskers can give. Political scientists seem relevant, historians, philosophers, social scientists. Artists should make art about possible scenarios. I think there should be a public debate about what alignment should mean exactly.
  I don’t think anyone of us (or even our bubble combined) is wise enough to decide the future of the universe unilaterally. We need to ask people: if we end up with this alignable ASI, what would you want it to do? What dangers do you see?
  - Steven Byrnes 3 Sep 2025 15:20 UTC
    3 points
    1
    Parent
    I agree with pretty much everything you wrote.
    Anecdote: I recall feeling a bit “meh” when I heard about the Foresight AI Pathways thing and the FLI Worldbuilding Contest thing.
    But when I think about it more, I guess I’m happy that they’re doing those things.
    Hmm, I’m trying to remember why I initially felt dismissive. I guess I expected that the resulting essays would be implausible or incoherent, and that nobody would pay attention anyway, and thus it wouldn’t really move the needle in the big picture. (I think those expectations were reasonable, and those are still my expectations. [I haven’t read the essays in enough detail to confirm.]) Maybe my feelings are more like frustration than dismissiveness—frustration that progress is so hard. Again, yeah, I guess it’s good that people are trying that kind of thing.
    - otto.barten 4 Sep 2025 9:35 UTC
      7 points
      0
      Parent
      Thanks, yeah, tbh I also felt dismissive about those projects. I’m one of the perhaps few people in this space who never liked scifi, and those projects felt like scifi exercises to me. Scifi feels a bit plastic to me, cheap, thin on the details, might as well be completely off. (I’m probably insulting people here, sorry about that, I’m sure there is great scifi. I guess these projects were also good, all considered.)
      
      But if it’s real, rather than scifi, the future and its absurdities suddenly become very interesting. Maybe we should write papers with exploratory engineering and error bars rather than stories on a blog? I did like the work of Anders Sandberg for example.
      
      What we want the future to be like, and not be like, necessarily has a large ethical component. I also have to say that ethics originating from the xrisk space, such as longtermism, tends to defend very non-mainstream ideas that I tend not to agree with. Longtermism has mostly been critiqued for its ASI claims, its messengers, and its lack of discounting factors, but I think the real controversial parts are its symmetric population ethics (leading to a necessity to quickly colonize the lightcone which I don’t necessarily share) and its debatable decision to count AI as valued population, too (leading to wanting to replace humanity with AI for efficiency reasons).
      
      I disagree with these ideas, so ethically, I’d trust a kind of informed public average more than many xriskers. I’d be more excited about papers trying their best to map possible futures, and using mainstream ethics (and fields like political science, sociology, psychology, art and aesthetics, economics, etc.) to 1) map and avoid ways to go extinct, 2) map and avoid major dystopias, and 3) try to aim for actually good futures.