Mind expanding on that? Which scenarios are you envisioning?
My general scenario is that as AI progresses and society reacts more to AI progress, there will be incentives to increase the amount of control that we have over AI because the consequences for not aligning AIs will be very high, both to the developer and to the legal consequences for them.
Essentially, the scenario is where unaligned AIs like Bing are RLHFed, DPOed or whatever the new alignment method is du jour away, and the AIs become more aligned due to profit incentives for controlling AIs.
The entire Bing debacle and the ultimate solution for misalignment in GPT-4 is an interesting test case, as Microsoft essentially managed to get it from a misaligned chatbot to a way more aligned chatbot, and I also partially dislike the claim of RLHF as a mere mask over some true behavior, because it’s quite a lot more effective than that.
More generally speaking, my point here is that in the AI case, there are strong incentives to make AI controllable, and weak incentives to make it non-controllable, which is why I was optimistic on companies making aligned AIs.
When we get to scenarios that don’t involve AI control issues, things get worse.
My general scenario is that as AI progresses and society reacts more to AI progress, there will be incentives to increase the amount of control that we have over AI because the consequences for not aligning AIs will be very high, both to the developer and to the legal consequences for them.
Essentially, the scenario is where unaligned AIs like Bing are RLHFed, DPOed or whatever the new alignment method is du jour away, and the AIs become more aligned due to profit incentives for controlling AIs.
The entire Bing debacle and the ultimate solution for misalignment in GPT-4 is an interesting test case, as Microsoft essentially managed to get it from a misaligned chatbot to a way more aligned chatbot, and I also partially dislike the claim of RLHF as a mere mask over some true behavior, because it’s quite a lot more effective than that.
More generally speaking, my point here is that in the AI case, there are strong incentives to make AI controllable, and weak incentives to make it non-controllable, which is why I was optimistic on companies making aligned AIs.
When we get to scenarios that don’t involve AI control issues, things get worse.