Lukas_Gloor comments on Anthropic, and taking “technical philosophy” more seriously

Lukas_Gloor 13 Mar 2025 13:29 UTC
11 points
1
It feels vaguely reasonable to me to have a belief as low as 15% on “Superalignment is Real Hard in a way that requires like a 10-30 year pause.” And, at 15%, it still feels pretty crazy to be oriented around racing the way Anthropic is.
Yeah, I think the only way I maybe find the belief combination “15% that alignment is Real Hard” and “racing makes sense at this moment” compelling is if someone thinks that pausing now would be too late and inefficient anyway. (Even then, it’s worth considering the risks of “What if the US aided by AIs during takeoff goes much more authoritarian to the point where there’d be little difference between that and the CCP?”) Like, say you think takeoff is just a couple of years of algorithmic tinkering away and compute restrictions (which are easier to enforce than prohibitions against algorithmic tinkering) wouldn’t even make that much of a difference now.
However, if pausing now is too late, we should have paused earlier, right? So, insofar as some people today justify racing via “it’s too late for a pause now,” where were they earlier?
Separately, I want to flag that my own best guess on alignment difficulty is somewhere in between your “Real Hard” and my model of Anthropic’s position. I’d say I’m overall closer to you here, but I find the “10-30y” thing a bit too extreme. I think that’s almost like saying, “For practical purposes, we non-uploaded humans should think of the deep learning paradigm as inherently unalignable.” I wouldn’t confidently put that below 15% (we simply don’t understand the technology well enough), but I likewise don’t see why we should be confident in such hardness, given that ML at least gives us better control of the new species’ psychology than, say, animal taming and breeding (e.g., Carl Shulman’s arguments somewhere—iirc—in his podcasts with Dwarkesh Patel). Anyway, the thing that I instead think of as the “alignment is hard” objection to the alignment plans I’ve seen described by AI companies, is mostly just a sentiment of, “no way you can wing this in 10 hectic months while the world around you goes crazy.” Maybe we should call this position “alignment can’t be winged.” (For the specific arguments, see posts by John Wentworth, such as this one and this one [particularly the section, “The Median Doom-Path: Slop, Not Scheming”].)
The way I could become convinced otherwise is if the position is more like, “We’ve got the plan. We think we’ve solved the conceptually hard bits of the alignment problem. Now it’s just a matter of doing enough experiments where we already know the contours of the experimental setups. Frontier ML coding AIs will help us with that stuff and it’s just a matter of doing enough red teaming, etc.”
However, note that even when proponents of this approach describe it themselves, it sounds more like “we’ll let AIs do most of it ((including the conceptually hard bits?))” which to me just sounds like they plan on winging it.
- Noosphere89 13 Mar 2025 16:46 UTC
  4 points
  2
  Parent
  My own take is I do endorse a version of the “pausing now is too late objection”, more specifically I think that for most purposes, we should assume pauses are too late to be effective when thinking about technical alignment, and a big portion of the reason is that I don’t think we will be able to convince many people that AI is powerful enough to need governance without them first hand seeing massive job losses, and at that point we are well past the point of no return for when we could control AI as a species.
  
  In particular, I think Eliezer is probably vindicated/made a correct prediction around how people would react to AI in there’s no fire alarm for AGI (more accurately, the fire alarm will go off way too late to serve as a fire alarm.)
  
  More here:
  
  https://www.lesswrong.com/posts/BEtzRE2M5m9YEAQpX/there-s-no-fire-alarm-for-artificial-general-intelligence