TAG comments on How Hard a Problem is Alignment?

TAG 16 Mar 2026 19:35 UTC
2 points
0

I think that approach genuinely does suffere from a Sharp Left turn once the AI’s capabilities significantly exceed ours: that seems to me like an approach where your control startegies really do need to be as smart as the thing you’re trying to control

There’s a very basic difference between the people who believe in SLT’S, rapid RSI , etc and those who don’t, and it affects their unspoken assumptions and semantics. The thing where it affects their senstuvs is a problem.

corrigibility has a bigger problem with extrapolation out-of-distribution and thus Goodharting than a more value-learning based approach

I don’t se why.

b) it is very, very easy for multiple groups of humans each with access to corrigible ASI to get into a war or other form of conflict using ASI-powered weapons/technologies

Agreed. I didn’t say so explicitly, but I was mainly concerned with Everybody Dies scenarios. I think a multipolar scenario where ASI are controllable and controlled by powerful interests is highly likely, but not completely fatal.

Tool AI isn’t the direction that the market and demand is currently moving, and has exactly the same potential for empowering existing human conflicts and enhancing concentration of power as Corrigible AI, if not even more so.

Ok, but that’s a different complaint to “it’s not even possible”. Also , the market is for agents that work for you, but do your own things. That’s a point against the standard Doom argument with a sovereign AI killing everyone for it’s own reasons.

So I see Corrigible AI and Tool AI as probably technically feasible, but as causing massive inherent sociotechnical risks. What we need is AI that is wiser and more ethical than humans, but actually aligned to what a very wide human would agree is in the general interests of all of humanity.

And a.way of forcing people to use it. If merely controllable/corrigible AI is available, powerful interests are going to prefer it.

So I agree that what you describe are approaches often outline to AI Alignment: I just disagree with calling that AI Safety: I see creating highly Corrigible AI as solving the technical AI Alignment problem at the cost of producing a different major new of X-Risk/S-Risk from AI, so not solving AI Safety.

Neither alignment nor safety is a simple binary
- RogerDearnaley 16 Mar 2026 21:09 UTC
  2 points
  0
  Parent
  Sounds like we’re mostly in agreement!