I do not know if you consider gradual disempowerment to be an illegible problem in AI safety (as I do), but it is certainly a problem independent of corrigibility/alignment.
As such, work on either illegible or legible problems tackling alignment/corrigibility can cause the same effect; is AI safety worth pursuing when it could lead to a world with fundamental power shifts in the disfavor of most humans?
I think my brain was trying to figure out why I felt inexplicably bad upon hearing that Joe Carlsmith was joining Anthropic to work on alignment, despite repeatedly saying that I wanted to see more philosophers working on AI alignment/x-safety. I now realize what I really wanted was for philosophers, and more people in general, to work on the currently illegible problems, especially or initially by making them more legible.)
I agree heartily, and I feel there’s been various expressions of the “paradox” of alignment research, it is a balancing act of enabling accelerationism & safety. However ultimately both pursuits enable the end goal of aligned AI.
Which could optimistically lead to a utopia of post-scarcity but could also lead to highly dystopian power dynamics. Ensuring the optimist’s hope is realized seems (to me) to be a highly illegible problem. Those in the AI safety research space largely ignore this, in favor of tackling more legible problems, including illegible alignment problems.
All of this is to say I feel the same thing you feel, but for all of AI safety research.
The extreme variance of responses/reception to the GD paper indicates that it is an obvious thing for some people (e.g., Zvi in his review of it), whereas for other people it’s a non-issue if you solve alignment/control (I think Ryan Greenblatt’s responses under one of Jan Kulveit’s posts about GD).
So I’d say it’s a legible problem for some (sub)groups and illegible for others, although there are some issues around conceptual engineering of the bridge between GD and orthodox AI X-risk that, as far as I’m aware, no one has nailed down yet.
I believe this is the response you’re referring to, interestingly within it he says
I do worry about human power grabs: some humans obtaining greatly more power as enabled by AI (even if we have no serious alignment issues). However, I don’t think this matches the story you describe and the mitigations seem substantially different than what you seem to be imagining.
Yes, GD largely imagines power concentrating directly into the hands of AI-systems themselves in absentia of a small group of people, but in the context of strictly caring about disempowerment the only difference between the two scenarios will be in the agenda of those in control, not the actual disempowerment itself.
This is the problem I was referring to that is independent of alignment/corrigibility, apologies for the lack of clarity.
I do not know if you consider gradual disempowerment to be an illegible problem in AI safety (as I do), but it is certainly a problem independent of corrigibility/alignment.
As such, work on either illegible or legible problems tackling alignment/corrigibility can cause the same effect; is AI safety worth pursuing when it could lead to a world with fundamental power shifts in the disfavor of most humans?
I agree heartily, and I feel there’s been various expressions of the “paradox” of alignment research, it is a balancing act of enabling accelerationism & safety. However ultimately both pursuits enable the end goal of aligned AI.
Which could optimistically lead to a utopia of post-scarcity but could also lead to highly dystopian power dynamics. Ensuring the optimist’s hope is realized seems (to me) to be a highly illegible problem. Those in the AI safety research space largely ignore this, in favor of tackling more legible problems, including illegible alignment problems.
All of this is to say I feel the same thing you feel, but for all of AI safety research.
The extreme variance of responses/reception to the GD paper indicates that it is an obvious thing for some people (e.g., Zvi in his review of it), whereas for other people it’s a non-issue if you solve alignment/control (I think Ryan Greenblatt’s responses under one of Jan Kulveit’s posts about GD).
So I’d say it’s a legible problem for some (sub)groups and illegible for others, although there are some issues around conceptual engineering of the bridge between GD and orthodox AI X-risk that, as far as I’m aware, no one has nailed down yet.
I believe this is the response you’re referring to, interestingly within it he says
Yes, GD largely imagines power concentrating directly into the hands of AI-systems themselves in absentia of a small group of people, but in the context of strictly caring about disempowerment the only difference between the two scenarios will be in the agenda of those in control, not the actual disempowerment itself.
This is the problem I was referring to that is independent of alignment/corrigibility, apologies for the lack of clarity.