In the strategy stealing assumption Paul makes an argument about people with short term preferences, that could be applied imo to people who are unwilling to listen to AI advice:
People care about lots of stuff other than their influence over the long-term future. If 1% of the world is unaligned AI and 99% of the world is humans, but the AI spends all of its resources on influencing the future while the humans only spend one tenth, it wouldn’t be too surprising if the AI ended up with 10% of the influence rather than 1%. This can matter in lots of ways other than literal spending and saving: someone who only cared about the future might make different tradeoffs, might be willing to defend themselves at the cost of short-term value (see sections 4 and 5 above), might pursue more ruthless strategies for expansion, and so on.
I think the simplest approximation is to restrict attention to the part of our preferences that is about the long-term (I discussed this a bit in [Why might the future be good?]). To the extent that someone cares about the long-term less than the average actor, they will represent a smaller fraction of this long-term preference mixture. This may give unaligned AI systems a one-time advantage for influencing the long-term future (if they care more about it) but doesn’t change the basic dynamics of strategy-stealing. Even this advantage might be clawed back by a majority (e.g. by taxing savers).
Maybe we can this same argument to people who don’t want to listen to AI advice: yes, this will lead those people to have less control over the future but some people will be willing to listen to AI advice and their preferences will retain influence over the future. This reduces human control over the future, but it’s a one time loss that isn’t catastrophic (that is it doesn’t cause total loss of control). Paul calls this a one-time disadvantage rather than total disempowerment because the rest of humankind can still replicate the critical strategy the unaligned AI might have exploited.
Possible counter: “The group of people who properly listens to AI advice will be too small to matter .” Yeah, I think this could lead to eg a 100x reduction in control over the future (if only 1% of humans properly listens), different people are more or less upset about this. One glimmer of hope is that the humans who do listen to their ai advisors can cooperate with people who don’t and help them get better at listening, thereby further empowering humanity.
For things like solving coordination problems, or societal resilience against violent takeover, I think it can be important that most people, or even virtually all people, are making good foresighted decisions. For example, if we’re worried about a race-to-the-bottom on AI oversight, and half of relevant decisionmakers allow their AI assistants to negotiate a treaty to stop that race on their behalf, but the other half think that’s stupid and don’t participate, then that’s not good enough, there will still be a race-to-the-bottom on AI oversight. Or if 50% of USA government bureaucrats ask their AIs if there’s a way to NOT outlaw testing people for COVID during the early phases of the pandemic, but the other 50% ask their AIs how best to follow the letter of the law and not get embarrassed, then the result may well be that testing is still outlawed.
For example, in this comment, Paul suggests that if all firms are “aligned” with their human shareholders, then the aligned CEOs will recognize if things are going in a long-term bad direction for humans, and they will coordinate to avoid that. That doesn’t work unless EITHER the human shareholders—all of them, not just a few—are also wise enough to be choosing long-term preferences and true beliefs over short-term preferences and motivated reasoning, when those conflict. OR unless the aligned CEOs—again, all of them, not just a few—are injecting the wisdom into the system, putting their thumbs on the scale, by choosing, even over the objections of the shareholders, their long-term preferences and true beliefs over short-term preferences and motivated reasoning.
In the strategy stealing assumption Paul makes an argument about people with short term preferences, that could be applied imo to people who are unwilling to listen to AI advice:
Maybe we can this same argument to people who don’t want to listen to AI advice: yes, this will lead those people to have less control over the future but some people will be willing to listen to AI advice and their preferences will retain influence over the future. This reduces human control over the future, but it’s a one time loss that isn’t catastrophic (that is it doesn’t cause total loss of control). Paul calls this a one-time disadvantage rather than total disempowerment because the rest of humankind can still replicate the critical strategy the unaligned AI might have exploited.
Possible counter: “The group of people who properly listens to AI advice will be too small to matter .” Yeah, I think this could lead to eg a 100x reduction in control over the future (if only 1% of humans properly listens), different people are more or less upset about this. One glimmer of hope is that the humans who do listen to their ai advisors can cooperate with people who don’t and help them get better at listening, thereby further empowering humanity.
For things like solving coordination problems, or societal resilience against violent takeover, I think it can be important that most people, or even virtually all people, are making good foresighted decisions. For example, if we’re worried about a race-to-the-bottom on AI oversight, and half of relevant decisionmakers allow their AI assistants to negotiate a treaty to stop that race on their behalf, but the other half think that’s stupid and don’t participate, then that’s not good enough, there will still be a race-to-the-bottom on AI oversight. Or if 50% of USA government bureaucrats ask their AIs if there’s a way to NOT outlaw testing people for COVID during the early phases of the pandemic, but the other 50% ask their AIs how best to follow the letter of the law and not get embarrassed, then the result may well be that testing is still outlawed.
For example, in this comment, Paul suggests that if all firms are “aligned” with their human shareholders, then the aligned CEOs will recognize if things are going in a long-term bad direction for humans, and they will coordinate to avoid that. That doesn’t work unless EITHER the human shareholders—all of them, not just a few—are also wise enough to be choosing long-term preferences and true beliefs over short-term preferences and motivated reasoning, when those conflict. OR unless the aligned CEOs—again, all of them, not just a few—are injecting the wisdom into the system, putting their thumbs on the scale, by choosing, even over the objections of the shareholders, their long-term preferences and true beliefs over short-term preferences and motivated reasoning.