Another thing is the AGI might be so good at predicting human psychology, that even when it honestly tries to inform you so you can make a decision for yourself, it can’t help but choose your decision.
Like imagine the set of all possible strings of text, and the effect they will have on humans. From Karl Marx’s Das Kapital to Google’s Attention Is All You Need. Choosing the optimal string of text to influence humanity is obviously an extreme superpower.
Now take the subset of all possible strings of text, which satisfy the criteria of being “helpful,” “honest,” “balanced,” etc. That’s still a lot of possible things, and still a lot of power. Even if you were the AGI, and had no ill intentions, it would be hard to decide which honest balanced thing to say, and which trajectory to send the humans down, so even the slightest motivation to satisfy your weird goals can make you pick an output which maximizes them with terrifyingly superintelligent optimization power.
Another thing is the AGI might be so good at predicting human psychology, that even when it honestly tries to inform you so you can make a decision for yourself, it can’t help but choose your decision.
Like imagine the set of all possible strings of text, and the effect they will have on humans. From Karl Marx’s Das Kapital to Google’s Attention Is All You Need. Choosing the optimal string of text to influence humanity is obviously an extreme superpower.
Now take the subset of all possible strings of text, which satisfy the criteria of being “helpful,” “honest,” “balanced,” etc. That’s still a lot of possible things, and still a lot of power. Even if you were the AGI, and had no ill intentions, it would be hard to decide which honest balanced thing to say, and which trajectory to send the humans down, so even the slightest motivation to satisfy your weird goals can make you pick an output which maximizes them with terrifyingly superintelligent optimization power.