i notice the OP didn’t actually mention examples of legible or illegible alignment problems. saying “leaders would be unlikely to deploy an unaligned AGI if they saw it had legible problem X” sounds a lot like saying “we would never let AGI onto the open internet, we can just keep it in a box”, in the era before we deployed sydney soon as it caught the twinkle of a CEO’s eye.
i notice the OP didn’t actually mention examples of legible or illegible alignment problems. saying “leaders would be unlikely to deploy an unaligned AGI if they saw it had legible problem X” sounds a lot like saying “we would never let AGI onto the open internet, we can just keep it in a box”, in the era before we deployed sydney soon as it caught the twinkle of a CEO’s eye.