Thank you for the very detailed comment! I’m pretty sympathetic to a lot of what you’re saying, and mostly agree with you about the three properties you describe. I also think we ought to do some more spelling-out of the relationship between gradual disempowerment and takeover risk, which isn’t very fleshed-out in the paper — a decent part of why I’m interested in it is because I think it increases takeover risk, in a similar but more general way to the way that race dynamics increase takeover risk.
I’m going to try to respond to the specific points you lay out, probably not in enough detail to be super persuasive but hopefully in a way that makes it clearer where we might disagree, and I’d welcome any followup questions off the back of that. (Note also that my coauthors might not endorse all this.)
Responding to the specific assumptions you lay out:
No egregious lying — Agree this will probably be doable, and pretty interested in the prospect of ‘ai police’, but not a crux for me. I think that, for instance, much of the harm caused by unethical business practices or mass manipulation is not reliant on outright lies but rather on ruthless optimisation. A lot also comes from cases where heavy manipulation of facts in a technically not-false way is strongly incentivised.
No strong AI rights — Agree de jure, disagree de facto, and partly this is contingent on the relative speeds of research and proliferation. Mostly I think there will be incentives and competitive pressures to give AIs decision making power, and that oversight will be hard, and that much of the harm will be emergent from complex interactions. I also think it’s maybe interesting to reflect on Ryan and Kyle’s recent piece about paying AIs to reveal misalignment — locally it makes sense, but globally I wonder what happens if we do a lot of that in the next few years.
No hot global war — Agree, also not a crux for me, although I think the prospect of war might generate pretty bad incentives and competitive pressures.
Overall, I think I can picture worlds where (conditional on no takeover) we reach states of pretty serious disempowerment of the kind described in the paper, without any of these assumptions fully breaking down. That said, I expect AI rights to be the most important, and the one that starts breaking down first.
As for the feedback loops you mention:
Owners of capital are aware of the consequences of their actions —
I think this is already sort of not true: I have very little sense of what my current stocks are doing, and my impression is many CEOs don’t really understand most of what’s going on in their companies. Maybe AIs are naturally easier to oversee in a way that helps, maybe they operate at unprecedented scale and speed, overall I’m unsure which way this cuts.
But also, I expect that the most important consequences of labor displacement by AIs are (1) displacement of human decision making, including over capital allocation and (2) distribution of economic power across humans, and resultant political incentives.
On top of that, I think a lot of the economic badness will be about aggregate competitive effects, rather than individual obviously-bad actions. If an individual CEO notices their company is doing something bad to stay competitive, which other companies are also doing, then stopping the badness in the world is a lot harder than shutting down a department.
Politicians stop systems from doing obviously bad things —
I also think this is not currently totally true; there is definitely a sense in which some politicians already do not change systems that have bad consequences (terrible material conditions for citizens, at least), partly because they themselves are beholden to some pretty unfortunate incentives. There are bad equilibria within parties, between parties, and between states.
I also think that the mechanisms which keep countries friendly to citizens specifically are pretty fragile and contingent.
So essentially, I think the standard of ‘obviously bad’ might not actually be enough.
Cultural consumption —
Here I am confused about where we’re disagreeing, and I think I don’t understand what you’re saying. I’m not sure why people being able to choose the culture they consume would help, and I don’t think it’s something we assume in the paper.
Thanks for your answer! I find it interesting to better understand the sorts of threats you are describing.
I am still unsure at what point the effects you describe result in human disempowerment as opposed to a concentration of power.
I have very little sense of what my current stocks are doing, and my impression is many CEOs don’t really understand most of what’s going on in their companies
I agree, but there isn’t a massive gap between the interests of shareholders and what companies actually do in practice, and people are usually happy to buy shares of public corporations (buying shares is among the best investment opportunities!). When I imagine your assumptions being correct, the natural consequence I imagine is AI-run companies own by shareholders that get most of the surplus back. Modern companies are a good example of capital ownership working for the benefit of the capital owner. If shareholders want to fill the world with happy lizards or fund art, they probably will be able to, just like current rich shareholders can. I think for this to go wrong for everyone (not just people who don’t have tons of capital) you need something else bad to happen, and I am unsure what that is. Maybe a very aggressive anti-capitalist state?
I also think this is not currently totally true; there is definitely a sense in which some politicians already do not change systems that have bad consequences
I can see how this could be true (e.g. the politicians are under the pressures of a public that has been brainwashed by algorithms maximizing engagements in a way that undermines the shareholders’ power without actually redistributing the wealth but instead spends it all on big national AI project that do not produce anything else than more AIs), but I feel like that requires some very weird things to be true (e.g. the algorithms maximizing engagement above results in a very unlikely equilibrium absent an external force that pushes against shareholders and against redistribution). I can see how the state could enable massive AI projects by massive AI-run orgs, but I think it’s way less likely that nobody (e.g. not the shareholders, not the taxpayer, not corrupt politicians, …) gets massively rich (and able to chose what to consume).
About culture, my point was basically that I don’t think evolution of media will be very disempowerment-favored. You can make better tailored AI -generated content and AI friends, but in most ways I don’t see how it results in everyone being completely fooled about the state of the world in a way that enables the other dynamics.
I feel like my position is very far from airtight, I am just trying to point at what feel like holes in a story I don’t manage to all fill simultaneously in a coherent way (e.g. how did the shareholders lose their purchasing power? What are the concrete incentives that prevent politicians from winning by doing campaigns like “everyone is starving while we build gold statues in favor of AI gods, how about we don’t”? What prevents people from not being brainwashed by the media that has already obviously brainwashed 10% of the population into preferring AI gold statues to not starving?). I feel like you might be able to describe concrete and plausible scenarios where the vague things I say are obviously wrong, but I am not able to generate such plausible scenarios myself. I think your position would really benefit from a simple caricature scenario where each step feels plausible and which ends up not in power being concentrated in the hands of shareholders / dictators / corrupt politicians, but in power in the hands of AIs colonizing the stars with values that are not endorsed by any single human (nor are a reasonable compromise between human values) while the remaining humans slowly starve to death.
I was convinced by What does it take to defend the world against out-of-control AGIs? that there is at least some bad equilibrium that is vaguely plausible, in part because he gave an existence proof by describing some concrete story. I feel like this is missing an existence proof (that would also help me guess what your counterarguments to various objections would be).
Thank you for the very detailed comment! I’m pretty sympathetic to a lot of what you’re saying, and mostly agree with you about the three properties you describe. I also think we ought to do some more spelling-out of the relationship between gradual disempowerment and takeover risk, which isn’t very fleshed-out in the paper — a decent part of why I’m interested in it is because I think it increases takeover risk, in a similar but more general way to the way that race dynamics increase takeover risk.
I’m going to try to respond to the specific points you lay out, probably not in enough detail to be super persuasive but hopefully in a way that makes it clearer where we might disagree, and I’d welcome any followup questions off the back of that. (Note also that my coauthors might not endorse all this.)
Responding to the specific assumptions you lay out:
No egregious lying — Agree this will probably be doable, and pretty interested in the prospect of ‘ai police’, but not a crux for me. I think that, for instance, much of the harm caused by unethical business practices or mass manipulation is not reliant on outright lies but rather on ruthless optimisation. A lot also comes from cases where heavy manipulation of facts in a technically not-false way is strongly incentivised.
No strong AI rights — Agree de jure, disagree de facto, and partly this is contingent on the relative speeds of research and proliferation. Mostly I think there will be incentives and competitive pressures to give AIs decision making power, and that oversight will be hard, and that much of the harm will be emergent from complex interactions. I also think it’s maybe interesting to reflect on Ryan and Kyle’s recent piece about paying AIs to reveal misalignment — locally it makes sense, but globally I wonder what happens if we do a lot of that in the next few years.
No hot global war — Agree, also not a crux for me, although I think the prospect of war might generate pretty bad incentives and competitive pressures.
Overall, I think I can picture worlds where (conditional on no takeover) we reach states of pretty serious disempowerment of the kind described in the paper, without any of these assumptions fully breaking down. That said, I expect AI rights to be the most important, and the one that starts breaking down first.
As for the feedback loops you mention:
Owners of capital are aware of the consequences of their actions —
I think this is already sort of not true: I have very little sense of what my current stocks are doing, and my impression is many CEOs don’t really understand most of what’s going on in their companies. Maybe AIs are naturally easier to oversee in a way that helps, maybe they operate at unprecedented scale and speed, overall I’m unsure which way this cuts.
But also, I expect that the most important consequences of labor displacement by AIs are (1) displacement of human decision making, including over capital allocation and (2) distribution of economic power across humans, and resultant political incentives.
On top of that, I think a lot of the economic badness will be about aggregate competitive effects, rather than individual obviously-bad actions. If an individual CEO notices their company is doing something bad to stay competitive, which other companies are also doing, then stopping the badness in the world is a lot harder than shutting down a department.
Politicians stop systems from doing obviously bad things —
I also think this is not currently totally true; there is definitely a sense in which some politicians already do not change systems that have bad consequences (terrible material conditions for citizens, at least), partly because they themselves are beholden to some pretty unfortunate incentives. There are bad equilibria within parties, between parties, and between states.
I also think that the mechanisms which keep countries friendly to citizens specifically are pretty fragile and contingent.
So essentially, I think the standard of ‘obviously bad’ might not actually be enough.
Cultural consumption —
Here I am confused about where we’re disagreeing, and I think I don’t understand what you’re saying. I’m not sure why people being able to choose the culture they consume would help, and I don’t think it’s something we assume in the paper.
I hope this sheds some light on things!
Thanks for your answer! I find it interesting to better understand the sorts of threats you are describing.
I am still unsure at what point the effects you describe result in human disempowerment as opposed to a concentration of power.
I agree, but there isn’t a massive gap between the interests of shareholders and what companies actually do in practice, and people are usually happy to buy shares of public corporations (buying shares is among the best investment opportunities!). When I imagine your assumptions being correct, the natural consequence I imagine is AI-run companies own by shareholders that get most of the surplus back. Modern companies are a good example of capital ownership working for the benefit of the capital owner. If shareholders want to fill the world with happy lizards or fund art, they probably will be able to, just like current rich shareholders can. I think for this to go wrong for everyone (not just people who don’t have tons of capital) you need something else bad to happen, and I am unsure what that is. Maybe a very aggressive anti-capitalist state?
I can see how this could be true (e.g. the politicians are under the pressures of a public that has been brainwashed by algorithms maximizing engagements in a way that undermines the shareholders’ power without actually redistributing the wealth but instead spends it all on big national AI project that do not produce anything else than more AIs), but I feel like that requires some very weird things to be true (e.g. the algorithms maximizing engagement above results in a very unlikely equilibrium absent an external force that pushes against shareholders and against redistribution). I can see how the state could enable massive AI projects by massive AI-run orgs, but I think it’s way less likely that nobody (e.g. not the shareholders, not the taxpayer, not corrupt politicians, …) gets massively rich (and able to chose what to consume).
About culture, my point was basically that I don’t think evolution of media will be very disempowerment-favored. You can make better tailored AI -generated content and AI friends, but in most ways I don’t see how it results in everyone being completely fooled about the state of the world in a way that enables the other dynamics.
I feel like my position is very far from airtight, I am just trying to point at what feel like holes in a story I don’t manage to all fill simultaneously in a coherent way (e.g. how did the shareholders lose their purchasing power? What are the concrete incentives that prevent politicians from winning by doing campaigns like “everyone is starving while we build gold statues in favor of AI gods, how about we don’t”? What prevents people from not being brainwashed by the media that has already obviously brainwashed 10% of the population into preferring AI gold statues to not starving?). I feel like you might be able to describe concrete and plausible scenarios where the vague things I say are obviously wrong, but I am not able to generate such plausible scenarios myself. I think your position would really benefit from a simple caricature scenario where each step feels plausible and which ends up not in power being concentrated in the hands of shareholders / dictators / corrupt politicians, but in power in the hands of AIs colonizing the stars with values that are not endorsed by any single human (nor are a reasonable compromise between human values) while the remaining humans slowly starve to death.
I was convinced by What does it take to defend the world against out-of-control AGIs? that there is at least some bad equilibrium that is vaguely plausible, in part because he gave an existence proof by describing some concrete story. I feel like this is missing an existence proof (that would also help me guess what your counterarguments to various objections would be).