A stubborn unbeliever finally gets the depth of the AI alignment problem

Link post

I realise posting this here might be preaching to the converted, but I think it could be interesting for some people to see a perspective from someone slow to get onboard with worrying about AI alignment.

I’m one of those people that finds it hard to believe that misaligned Artificial General Intelligence (AGI) could destroy the world. Even though I’ve understood the main arguments and can’t satisfyingly refute them, a part of my intuition won’t easily accept that it’s an impending existential threat. I work on deploying AI algorithms in industry, so have an idea of both how powerful and limited they can be. I also get why AI safety in general should be taken seriously, but I struggle to feel the requisite dread.

The best reason I can find for my view is that there is a lot of “Thinkism” in arguments for AGI takeoff. Any AGI that wants to make an influence outside of cyberspace, e.g. by building nanobots or a novel virus, will ultimately run into problems of computational irreducibility — it isn’t possible to model everything accurately, so empirical work in the physical world will always be necessary. These kind of experiments are slow, messy and resource intensive. So, any AGI is going to reach some limits when it tries to influence the physical world. I do realise there are loads of ways an AGI can cause a lot of damage without requiring the invention of new physical technologies, but this still slowed things down enough for me to worry less about alignment issues.

That was, until I started realising that alignment problems aren’t limited to the world of AI. If you look around you can see them everywhere. The most obvious example is climate change — there is a clear misalignment between the motivations of the petroleum industry and the long term future of humanity, which causes a catastrophic problem. The corporate world is full of such alignment problems, from the tobacco industry misleading the public about the harms of smoking to social media companies hijacking our attention.

It was exploring the problems caused by social media that helped me get the scale of the issue. I wrote an essay to understand why I was spending so much time on browsing the internet, without much to show for it or really enjoying the experience. You can read the full essay here, but the main takeaway for AI safety is that we can’t even deploy simple AI algorithms at scale without causing big societal problems. If we can’t manage in this easy case, how can we possibly expect to be able to deal with more powerful algorithms?

The issue of climate change is even slower acting and more problematic than social media. There’s also a clear scientific consensus, with a lot of public understanding about how bad it is, yet we still aren’t able to respond in a decisive and rational manner. Realising this has finally driven home how much of an issue AGI misalignment is going to be. Even if it might not happen at singularity inducing speeds it’s going to be incredibly destabilising and difficult to deal with. Then even if the AGIs themselves could be aligned, we have to seriously worry about aligning the people that deploy them.

I think there might be a silver lining though. As outlined above, I have a suspicion of solutions that look like Thinkism, and aren’t tested in the real world. However, as there are a whole bunch of existing alignment problems waiting to be resolved, they could act as real world testing grounds before we run into a serious AGI alignment issue. I personally believe the misalignment of social media companies with their users could be a good place to start. It would be very informative to try to build machine learning algorithms for large scale content recommendation that give people a feeling of flourishing on the internet, rather than time wasting and doom scrolling. You can read more details of my specific thoughts about this problem in my essay.

As a final bonus, I think there was something else that made it difficult for me to grok the AI alignment problem — I find it hard to intuitively model psychopathic actors. Even though I know an AI wouldn’t think like a human, if I try to imagine how it might think, I still end up giving it a human thought process. I finally managed to break this intuition reading this great short story by Ted Chiang—Understand. I recommend anyone who hasn’t read it to read it with AI in mind. It really gives you a feel of the perspective of a misaligned super-intelligence. Unfortunately, I think for the ending to work out in the same way, we’d have to crack the alignment problem first.

So, now this non-believer has been converted, I finally feel onboard with all the panic, which hasn’t been helped by the insane progress in AI capabilities this year. It’s time to start thinking about this more seriously…

I’m sure I’m not the first to have these thoughts, so if you can share any links below for me to read further it would be appreciated.

Crossposted to EA Forum (32 points, 7 comments)