I don’t understand how working on “AI control” here is any worse than working on AI alignment (I’m assuming you don’t feel the same about alignment since you don’t mention it).
In my mind, two different ways AI could cause bad things to happen are: (1) misuse: people use the AI use it for bad things, and (2) misalignment: regardless of anyone’s intent, the AI does bad things of its own accord.
Both seem bad. Alignment research and control are both ways to address misalignment problems, I don’t see how they differ for the purposes of your argument (though maybe I’m failing to understand your argument).
Addressing misalignment slightly increases people’s ability to misuse AI, but I think the effect is fairly small and outweighed by the benefit of decreasing the odds a misaligned AI takes catastrophic actions.
It’s not. Alignment is de facto capabilities (principal agent problem makes aligned employees more economically valuable) and unless we have a surefire way to ensure that the AI is aligned to some “universal,” or even cultural, values, it’ll be aligned by default to Altman, Amodei, et. al.
I don’t understand how working on “AI control” here is any worse than working on AI alignment (I’m assuming you don’t feel the same about alignment since you don’t mention it).
In my mind, two different ways AI could cause bad things to happen are: (1) misuse: people use the AI use it for bad things, and (2) misalignment: regardless of anyone’s intent, the AI does bad things of its own accord.
Both seem bad. Alignment research and control are both ways to address misalignment problems, I don’t see how they differ for the purposes of your argument (though maybe I’m failing to understand your argument).
Addressing misalignment slightly increases people’s ability to misuse AI, but I think the effect is fairly small and outweighed by the benefit of decreasing the odds a misaligned AI takes catastrophic actions.
It’s not. Alignment is de facto capabilities (principal agent problem makes aligned employees more economically valuable) and unless we have a surefire way to ensure that the AI is aligned to some “universal,” or even cultural, values, it’ll be aligned by default to Altman, Amodei, et. al.
We don’t know of an alignment target that everyone can agree on, so solving alignment pretty much guarantees misuse by at least some people’s lights.
I mean “not solving alignment” pretty much guarantees misuse by everyone’s lights? (In both cases conditional on building ASI)
It pretty much guarantees extinction, but people can have different opinions on how bad that is relative to disempowerment, S-risks, etc.