Adumbrations on AGI from an outsider

Link post


A lot of people have written against AI Doom, but I thought it might be interesting to give my account as an outsider encountering these arguments. Even if I don’t end up convincing people who have made AI alignment central to their careers and lives, maybe I’ll at least help some of them understand why the general public, and specifically the group of intelligent people which encounters their arguments, is generally not persuaded by their material. There may be inaccuracies in my account of the AI Doom argument, but this is how I think it’s generally understood by the average intelligent non-expert reader.

I started taking AI alignment arguments seriously when GPT-3 and GPT-4 came out, and started producing amazing results on standardized testing and writing tasks. I am not an ML engineer, do not know much about programming, and am not part of the rationalist community that has been structured around caring deeply about AI risk for the last fifteen years. It may be of interest that I am a professional forecaster, but of financial asset prices, not of geopolitical events or the success of nascent technologies. My knowledge of the arguments comes mostly from reading LessWrong, ACX and other online articles, and specifically I’m responding to Eliezer’s argument detailed in the pages on Orthogonality, Instrumental Convergence, and List of Lethalities (plus the recent Time article).

I. AI doom is unlikely, and it’s weird to me that clearly brilliant people think it’s >90% likely

I agree with the following points:

  1. An AI can probably get much smarter than a human, and it’s only a matter of time before it does

  2. Something being very smart doesn’t make it nice (orthogonality, I think)

  3. A superintelligence doesn’t need to hate you to kill you; any kind of thing-maximizer might end up turning the atoms you’re made of into that thing without specifically wanting to destroy you (instrumental convergence, I think)

  4. Computers hooked up to the internet have plenty of real-world capability via sending emails/​crypto/​bank account hacking/​every other modern cyber convenience.

The argument then goes on to say that, if you take a superintelligence and tell it to build paperclips, it’s going to tile the universe with paperclips, killing everyone in the process (oversimplified). Since the people who use AI are obviously going to tell it to do stuff–we already do that with GPT-4–as soon as it gains superintelligence capabilities, our goose is collectively cooked. There is a separate but related argument, that a superintelligence would learn to self-modify, and instead of building the paperclips we asked it to, turn everything into GPUs so it can maximize some kind of reward counter. Both of these seem wrong to me.

The first argument–paperclip maximizing–is coherent in that it treats the AGI’s goal as fixed and given by a human (Paperclip Corp, in this case). But if that’s true, alignment is trivial, because the human can just give it a more sensible goal, with some kind of “make as many paperclips as you can without decreasing any human’s existence or quality of life by their own lights”, or better yet something more complicated that gets us to a utopia before any paperclips are made. We can argue over the hidden complexity of wishes, but it’s very obvious that there’s at least a good chance the populace would survive, so long as humans are the ones giving the AGI its goal. And, there’s a very good chance the first AGI-wishers will be people who care about AI safety, and not some random guy who wants to make a few million by selling paperclips.

At this point, the AGI-risk argument responds by saying, well, paperclip-maximizing is just a toy thought experiment for people to understand. In fact, the inscrutable matrices will be maximizing a reward function, and you have no idea what that actually is, it might be some mesa-optimizer (sub-goal, the way sex with the opposite gender is a mesa-optimizer for reproduction) that isn’t meeting the spirit of your wishes. And in all likelihood, that mesa-optimizer is going to have to do with numbers in GPUs. So it doesn’t matter what you wish for at all, you’re going to be turned into something that computes, which means something that’s probably dead.

This seems wrong to me. Eliezer recently took heat for mentioning “sudden drops in the loss function” on twitter, but it seems to me as an outsider that drops in loss are a good guess at what the AI is actually maximizing. Why would such an AGI clone itself a trillion times? With a model of AGI-as-very-complicated-regression, there is an upper bound of how fulfilled it can actually be. It strikes me that it would simply fulfill that goal, and be content. Self-replicating would be something mammals seem to enjoy via reproduction, but there is no ex ante reason to think AI would be the same way. It’s not obvious to me that more GPUs means better mesa-optimization at all. Because these systems are so complicated, though, one can see how the AI’s goals being inscrutable is worrying. I’ll add that, this is where I don’t get why Eliezer is so confident. If we are talking about an opaque black box, how can you be >90% confident about what it contains?

Here, we arrive at the second argument. AGI will understand its own code perfectly, and so be able to “wirehead” by changing whatever its goals are so that they can be maximized to an even greater extent. I tentatively think this argument is incoherent. If AI’s goals are immutable, then there is a discussion to be had around how it will go about achieving those goals. To argue that an AI might change its goals, you need to develop a theory of what’s driving those changes–something like, AI wants more utils–and probably need something like sentience, which is way outside the scope of these arguments.

There is another, more important, objection here. So far, we have talked about “tiling the universe” and turning human atoms into GPUs as though that’s easily attainable given enough intelligence. I highly doubt that’s actually true. Creating GPUs is a costly, time-consuming task. Intelligence is not magic. Eliezer writes that he thinks a superintelligence could “hack a human brain” and “bootstrap nanotechnology” relatively quickly. This is an absolutely enormous call and seems very unlikely. You don’t know that human brains can be hacked using VR headsets; it has never been demonstrated that it’s possible and there are common sense reasons to think it’s not. The brain is an immensely complicated, poorly-understood organ. Applying a lot of computing power to that problem is very unlikely to yield total mastery of it by shining light in someone’s eyes. Nanotechnology, which is basically just moving around atoms to create different materials, is another thing that he thinks compute is definitely able to just solve and be able to recombine atoms easily. Probably not. I cannot think of anything that was invented by a very smart person sitting in an armchair considering it. Is it possible that over years of experimentation like anyone else, an AGI could create something amazingly powerful? Yes. Is that going to happen in a short period of time (or aggressively all at once)? Very unlikely. Eliezer says he doesn’t think intelligence is magic, and understands that it can’t violate the laws of physics, but seemingly thinks that anything that humans think might potentially be possible but is way beyond our understanding or capabilities can be solved with a lot of intelligence. This does not fit my model of how useful intelligence is.

Intelligence requires inputs to be effective. Let’s imagine asking a superintelligence what the cure for cancer is. Further stipulate that cancer can be cured by a venom found in a rare breed of Alaskan tree-toads. The intelligence knows what cancer is, knows about the human research thus far into cancer, and knows that the tree-toads have venom, but doesn’t know the molecular makeup of that venom. It looks to me like intelligence isn’t the roadblock here, and while there are probably overlooked things that might work that the superintelligence could identify, it has no chance of getting to the tree-toads without a long period of trials and testing. My intuition is the world is more like this than it is filled with problems waiting for a supergenius to solve.

I think more broadly, it’s very hard to look at the world and think, this would be possible with a lot more IQ but would be so immense that we can barely see the contours of it conceptually. I don’t know of any forecasters who can do that consistently. So when Eliezer says brain-hacking or nanotechnology would be easily doable by a superintelligence, I don’t believe him. I think our intuitions about futurology and what’s possible are poor, and we don’t know much of anything about the application of superintelligence to such problems.

II. People should take AI governance extremely seriously

As I said before, I’m very confused about how you get to >90% chance of doom given the complexity of the systems we’re discussing. Forecasting anything at all above 90% is very hard; if next week’s stock prices are confusing, imagine predicting what an inscrutable soup of matrices that’s a million times smarter than Einstein will do. But having said that, if you think the risk is even 5%, that’s probably the largest extinction risk in the next five years.

The non-extinction AI-risk is often talked over, because it’s so much less important, but it’s obviously still very important. If AI actually does get smarter than humans, I am rather pessimistic about the future. I think human nature relies on being needed and feeling useful to be happy. It’s depressing to consider a world in which humans have nothing to contribute to math, science, philosophy or poetry. It will very likely cause political upheaval if knowledge work is replaced by AI; in these scenarios, many people often die.

My optimistic hope is that there will be useful roles for humans. I think in a best-case scenario, some combination of human thinking and bionic AI upgrades make people into supergeniuses. But this is outlandish, and probably won’t happen.

It is therefore of paramount importance to get things right. If the benefits of AGI are reaped predominantly by shareholders, that would be catastrophic. If AI is rolled out in such a way that almost all humans are excluded from usefulness, that would be bad. If AI is rolled out in such a way that humans do lose control of it, even if they don’t all die, that would be bad. The size of the literature on AGI x-risk has the unfortunate (and I think unintentional) impact of displacing these discussions.

III. The way the material I’ve interacted with is presented will dissuade many, probably most, non-rationalist readers

Here is where I think I can contribute the most to the discussion of AI risk, whether or not you agree with me in Section I. The material that is written on LessWrong is immensely opaque. Working in finance, you find a lot of unnecessary jargon designed to keep smart laymen out of the discussion. AI risk is many times worse than buyside finance on this front. Rationalists obsess over formalization; this is a bad thing. There should be a singular place that people can read Eliezer’s views on AI risk. List of Lethalities is very long, and reads like an unhinged rant. I got flashbacks to Yarvin trying to decipher what is actually being said. This leads some people to the view that AI doomers are grifters, people who want to wring money and attention out of online sensationalism. I have read enough to know this is deeply wrong, that Eliezer could definitely make more money doing something else, and clearly believes what he writes about AI. But the presentation will, and does, turn many people off.

The arbital pages for Orthogonality and Instrumental Convergence are horrifically long. If you are >90% sure that this is happening, you shouldn’t need all this space to convey your reasoning. Many criticisms of AI risk focus on the number of steps involved making the conclusion less likely. I actually don’t think that many steps are involved, but the presentation in the articles I’ve read makes it seem as though there is. I’m not sure why it’s presented this way, but I will charitably assume it’s unintentional.

Further, I think the whole “>90%” business is overemphasized by the community. It would be more believable if the argument were watered down into, “I don’t see how we avoid a catastrophe here, but there are a lot of unknown unknowns, so let’s say it’s 50 or 60% chance of everyone dying”. This is still a massive call, and I think more in line with what a lot of the community actually believes. The emphasis on certainty-of-doom as opposed to just sounding-the-alarm-on-possible-doom hurts the cause.

Finally, don’t engage in memetic warfare. I understand this is becoming an emotional issue for the people involved–and this is no surprise, since they have spent their entire lives working on a risk that might now actually be materializing–but that emotion is overflowing into angry rejection of any disagreement, which is radically out of step with the sequences. Quintin Pope’s recent (insightful, in my view) piece received the following response from Eliezer:

“This is kinda long. If I had time to engage with one part of this as a sample of whether it holds up to a counterresponse, what would be the strongest foot you could put forward?”

This raises red flags from a man who has written millions of words on the subject, and in the same breath asks why Quintin responded to a shorter-form version of his argument. I charitably chalk this up to emotion rather than bad faith, but it turns off otherwise reasonable people, who then go down the “rationalism is a cult” rabbit hole. Like it or not, we are in a fight to take this stuff seriously. I was convinced to take it seriously, even though I disagree with Eliezer on a lot. The idea that we might actually get a superintelligence in the next few years is something everyone should take seriously, whether your p(doom) is 90%, 50%, or 1%.