I think that approach genuinely does suffere from a Sharp Left turn once the AI’s capabilities significantly exceed ours: that seems to me like an approach where your control startegies really do need to be as smart as the thing you’re trying to control
There’s a very basic difference between the people who believe in SLT’S, rapid RSI , etc and those who don’t, and it affects their unspoken assumptions and semantics. The thing where it affects their senstuvs is a problem.
corrigibility has a bigger problem with extrapolation out-of-distribution and thus Goodharting than a more value-learning based approach
I don’t se why.
b) it is very, very easy for multiple groups of humans each with access to corrigible ASI to get into a war or other form of conflict using ASI-powered weapons/technologies
Agreed. I didn’t say so explicitly, but I was mainly concerned with Everybody Dies scenarios. I think a multipolar scenario where ASI are controllable and controlled by powerful interests is highly likely, but not completely fatal.
Tool AI isn’t the direction that the market and demand is currently moving, and has exactly the same potential for empowering existing human conflicts and enhancing concentration of power as Corrigible AI, if not even more so.
Ok, but that’s a different complaint to “it’s not even possible”. Also , the market is for agents that work for you, but do your own things. That’s a point against the standard Doom argument with a sovereign AI killing everyone for it’s own reasons.
So I see Corrigible AI and Tool AI as probably technically feasible, but as causing massive inherent sociotechnical risks. What we need is AI that is wiser and more ethical than humans, but actually aligned to what a very wide human would agree is in the general interests of all of humanity.
And a.way of forcing people to use it. If merely controllable/corrigible AI is available, powerful interests are going to prefer it.
So I agree that what you describe are approaches often outline to AI Alignment: I just disagree with calling that AI Safety: I see creating highly Corrigible AI as solving the technical AI Alignment problem at the cost of producing a different major new of X-Risk/S-Risk from AI, so not solving AI Safety.
There’s a very basic difference between the people who believe in SLT’S, rapid RSI , etc and those who don’t, and it affects their unspoken assumptions and semantics. The thing where it affects their senstuvs is a problem.
I don’t se why.
Agreed. I didn’t say so explicitly, but I was mainly concerned with Everybody Dies scenarios. I think a multipolar scenario where ASI are controllable and controlled by powerful interests is highly likely, but not completely fatal.
Ok, but that’s a different complaint to “it’s not even possible”. Also , the market is for agents that work for you, but do your own things. That’s a point against the standard Doom argument with a sovereign AI killing everyone for it’s own reasons.
And a.way of forcing people to use it. If merely controllable/corrigible AI is available, powerful interests are going to prefer it.
Neither alignment nor safety is a simple binary
Sounds like we’re mostly in agreement!