The benevolence of the butcher

A few days ago I published this post on the risks of powerful transformative AGI (by which I meant AGI that takes off fast and pretty much rules the world in no time), even if aligned. Among the comments there was one by Paul Christiano which I think was very interesting, but also focused on a different scenario, one of slower take off in which AGI stays with us as a regular part of our economy for a bit longer. This post is an elaboration of the answer I gave there, because it brought to my mind a different kind of risk.

It’s a common in rebuttals to pessimism about AGI to compare it to other past technologies, and how eventually they all ended up boosting productivity and thus raising the average human welfare in the long run (though I would also suggest that we do not completely ignore the short run: after all, we’re most likely to live through it. I don’t just want the destination to be nice, I want the trip to be reasonably safe!). I worry however that carrying this way of thinking to AGI might be a case of a critical error of extrapolation—applying knowledge that worked in a certain domain to a different domain in which some very critical assumptions on which that knowledge relied aren’t true any more.

Specifically, when one thinks of any technology developed during or after the industrial revolution, one thinks of a capitalist, free-market economy. In such an economy, there are people who mostly own the capital (the land, the factories, and any other productive infrastructure) and there are people who mostly work for the former, putting the capital to use so it can actually produce wealth. The capital acts as a force multiplier which makes the labour of a single human be worth tens, hundreds, thousands of times what it would have been in a pre-industrial era; but ultimately, it is still a multiplier. A thousand times zero is zero: the worker is still an essential ingredient. The question of how this surplus in productivity is to be split fairly between interests that reward the owner of the capital for the risk they take and salary for the worker who actually put in the labour has been a… somewhat contentious issue throughout the last two centuries, but all capitalist economies exist in some kind of equilibrium that is satisfactory enough to at least not let the social fabric straight up unravel itself. Mostly, both groups need each other, and not just that; workers being also consumers, their participation in the economy is vital to make the huge gains of industrial productivity even worth anything at all, and them having higher living standards (including, crucially, good literacy and education) is essential to their ability to actually contribute to systems that have been growing more and more cognitively complex by the year. These forces are an essential part of what propelled our society to its current degree of general prosperity through the 19th, 20th and now 21st century.

But this miracle is not born of disinterested generosity. Rather, it has been achieved through a lot of strife, and is an equilibrium between different forms of self-interest. The whole sleight of (invisible) hand with which free market capitalism makes people richer is that making other people well off is the best way to make yourself very well off. To quote Adam Smith himself, “it is not from the benevolence of the butcher, the brewer, or the baker that we expect our dinner, but from their regard to their own self-interest”. In AI language, one can think of a capitalist society as a sort of collective superintelligence whose terminal goal is everyone’s selfish interest to personally live better, roughly weighed by how much capital (and thus steering power over the economy) they control, but structured in such a way that its instrumental goal is then to generate overall wealth and well-being all around. Not that these two goals are always perfectly aligned: if a factory can get away with making its operation cheaper by polluting a river it often will (or it will be punished by the competition that does for holding back). But as long as the rules are well-designed, the coupling is, if not perfect, at least satisfactory.

AGI risks completely breaking that. AGI does not just empower workers to be more productive, it replaces them, and in doing so, it could decouple entirely those two goals—one that owns capital could achieve personal prosperity without any need for collective one. Consider a scenario in which AGI and human-equivalent robotics are developed and end up owned (via e.g. controlling exclusively the infrastructure that runs it, and being closed source) by a group of, say, 10,000 people overall who have some share in this automation capital. If these people have exclusive access to it, a perfectly functional equilibrium is “they trade among peers goods produced by their automated workers and leave everyone else to fend for themselves”. Sam Altman in his Moore’s Law for Everything manifesto suggests a scheme of UBI funded by a tax on capital which he claims would redistribute the profits of AGI to everyone. But that is essentially relying on the benevolence of the butcher for our dinner. It’s possible that some companies might indeed do that, just like some companies today make genuine efforts to pay their workers more fairly, or be more environmentally conscious, above and beyond what simply benefits them in terms of PR. But as long as the incentives aren’t in favour of that, they will be the exception, not the rule. If AGI can do anything that a human can, possibly better than 90% of real human workers, then there will be no leverage anyone who doesn’t control it can hold over those who do. Strikes are pointless, because you can’t withdraw labour no one wants. Riots and revolts are pointless, because no robot army will ever hesitate to shoot you and turn against its master out of sympathy. Every single rule we think we know about how advances in productivity benefit the rest of society would break down.

(I guess Sam Altman’s proposal might work out if his full plan was to become the only capitalist in the world, and then to become immortal so that no one else ever has to inherit his throne, and then to hold himself to some kind of binding vow to never abandon those values he committed to. I think it says a lot about the utter insanity of the situation that I can’t rule that out completely)

Now, to steelman the possible criticisms to my argument and end the post on a somewhat more positive note, here’s a few possible ways I can think of to escape the trap:

  • make AGI autonomous and distributed, so no one has single control over it: this solves the risks of centralised control but of course it creates about a thousand others. Still, if it was possible to align and deploy safely such an AGI, this would probably be the surest way to avoid the decoupling risk;

  • keep AGI cognitive, no robotics: this keeps humans in the loop for some pretty fundamental stuff without which nothing else is possible (food, minerals, steel and so on). Honestly, though, not sure why if we weren’t able to stop ourselves from creating AGI we’d suddenly draw the line at robotics. The debates would be the same all over again. It would also be at least ironic if instead of freeing humanity from toil, automation ended up forcing it back onto the fields and into the mines as the best possible way to stay relevant. Besides, if we stay on Earth, we can’t all keep extracting resources at a much greater pace that we are already: our support systems are strained as they are. I suppose recycling plants would have a boom;

  • keep humans relevant for alignment: even if AGI gets creative enough to not need new human-generated material for training, it’s reasonable to expect it might need a constant stream of human-values-laden datasets to keep it in line with our expectations. Upvoting and downvoting short statements to RLHF the models that have taken all of the fun jobs may not be the most glamorous future, but it’s a living. More generally, humans could be the “idea guys” who organise the work of AGIs in new enterprises, but I don’t know if you can build a sustainable society in which everyone is essentially a start-up founder with robotic workers;

  • keep AGI non-agentic, and have humans always direct its actions: this one is a stronger version of the previous one. It’s better IMO since it also leaves value-laden choices firmly in human hands, but it still falls in the category of voluntarily crippling our tech and keeping it that way. I still think it’s the best shot we have, but I admit it’s hard to imagine how to make sure that is a stable situation;

  • make sure everyone has a veto on AGI use: this is a bit of a stricter variation on Altman’s plan, borrowing something from the distributed idea from before. He suggests pooling equity shares of AGI capital into a fund from which all Americans draw a basic income (though the risk here is this doesn’t cover what happens to non Americans when American companies capture most of the value of their jobs too). The problem I have with that is that ultimately shares are just pieces of paper. If all the power rests with AGIs, and if access to these AGIs is kept by a handful of people, then those people effectively hold the power, and the rest is just a pretty façade that can fall if poked for long enough. For the shares plan to work consistently, everyone needs to hold a literal share of control over the use of the AGI itself. For example, a piece of a cryptographic key necessary to encode every order to it. I’m not sure how you could make this work (you both need to make sure that no single individual can straight up freeze the country, but also that thousands or millions of individuals in concerted action could hold some real power), but if it was applied from the very beginning, it would hopefully hold in a stable manner for a long time. I’d still worry however about the international aspect, since this probably would only be doable within a single country.

None of these ideas strikes me as fully satisfying, but I tried. I’d like to hear any criticism or other ideas, especially if better. If there aren’t any realistic paths outside of the trap, I think it’s necessary to consider whether the utopian visions of a post-scarcity world aren’t mostly wishful thinking, and the reality risks being a lot less pleasant.