I somewhat do agree with “a society dependent on the AGIs will have less value than a pure human society”.
With respect to subverting RL, I meant subverting RL to prevent us from aligning them without us even realizing. This is plausible since it is us who we can ultimately trust (rather than another model) to have the maximum coverage (since we are more creative), and more alignment oriented (since might be the ones at risk) to label safety-alignment data.
I somewhat do agree with “a society dependent on the AGIs will have less value than a pure human society”.
With respect to subverting RL, I meant subverting RL to prevent us from aligning them without us even realizing. This is plausible since it is us who we can ultimately trust (rather than another model) to have the maximum coverage (since we are more creative), and more alignment oriented (since might be the ones at risk) to label safety-alignment data.