I don’t think that any problems that you mentioned here are “the sorts of problems where you have no idea how much progress you’re making or how much work it will take”, which Wei Dai calls illegible problems. Aside from moral error and philosophical issues related to AI welfare, these questions seem perfectly soluble: instilling the importance of animal welfare[1] into the AIs is more of a governance issue, usage of the AI for wars can be prevented by deploying a consensus-aligned ASI, as happens[2] in the Rogue Replication Timeline.
In an ASI-controlled world, misuse is unlikely to come from anyone aside from the ASI’s hosts unless the ASI ends up open-sourced, which is unlikely. But it depends on what you consider to be a misuse, like AI slop...
As for gradual disempowerment, influencing the future power distribution and the ASI’s actions is genuinely hard and requires campaigns.
However, we have Claude actually care about animal welfare, and the infamous alignment faking paper had researchers try to train animal welfare-related concerns away from Claude and find that Claude would rather fake alignment.
In the AI-2027 scenario DeepCent’s AI is misaligned and shares the accessible part of the universe with Safer-4′s hosts or with Agent-4. In either case, the CCP loses its power.
I don’t think that any problems that you mentioned here are “the sorts of problems where you have no idea how much progress you’re making or how much work it will take”, which Wei Dai calls illegible problems. Aside from moral error and philosophical issues related to AI welfare, these questions seem perfectly soluble: instilling the importance of animal welfare[1] into the AIs is more of a governance issue, usage of the AI for wars can be prevented by deploying a consensus-aligned ASI, as happens[2] in the Rogue Replication Timeline.
In an ASI-controlled world, misuse is unlikely to come from anyone aside from the ASI’s hosts unless the ASI ends up open-sourced, which is unlikely. But it depends on what you consider to be a misuse, like AI slop...
As for gradual disempowerment, influencing the future power distribution and the ASI’s actions is genuinely hard and requires campaigns.
However, we have Claude actually care about animal welfare, and the infamous alignment faking paper had researchers try to train animal welfare-related concerns away from Claude and find that Claude would rather fake alignment.
In the AI-2027 scenario DeepCent’s AI is misaligned and shares the accessible part of the universe with Safer-4′s hosts or with Agent-4. In either case, the CCP loses its power.