Eliezer is still ridiculously optimistic about AI risk

All power to crypto channel Bankless for their interview with Eliezer:


They actually take his arguments seriously. If I wanted to blow my life savings on some wretched crypto scam I’d certainly listen to these guys about what the best scam to fall for was.


Eliezer is visibly broken. Visibly ill.

This is what it looks like when the great hero of humanity, who has always been remarkably genre-savvy, realises that the movie he’s in is ‘Lovecraft-style Existential Cosmic Horror’, rather than ‘Rationalist Harry Potter Fanfic’.

No happy ending. Just madness and despair.

All power to Eliezer for having had a go. What sort of fool gives up before he’s actually lost?

There are configurations of matter where something that remembers being me knows that the world only survived because Eliezer tried.

I do hope that my future self has the good grace to build at least one one hundred-foot gold statue of Eliezer.

But that’s not the way to bet.


There’s almost nothing new here.

The only bit that surprised me was when he says that if you give ChatGPT the right prompt, it can do long multiplication. I went and looked that up. It seems to be true.

Oh great, the fucking thing can execute arbitrary code, can it?

Don’t worry, it’s just a next-token predictor.


But Eliezer, despite the fact that he is close to tears while explaining, for the eight hundredth time, why we are all going to die, is still ridiculously optimistic about how difficult the alignment problem is.

This is a direct quote:

> If we got unlimited free retries and 50 years to solve everything, it’d be okay.
> We could figure out how to align AI in 50 years given unlimited retries.

No, we couldn’t. This is just a sentient version of the Outcome Pump, ably described by Eliezer himself sixteen years ago: https://​​www.lesswrong.com/​​posts/​​4ARaTpNX62uaL86j6/​​the-hidden-complexity-of-wishes

What happens in this Groundhog Day scenario?

It depends very much on what the reset condition is.

If the reset is unconditional, then we just replay the next fifty years forever.

But the replays are not exact, non-determinacy and chaos mean that things go differently every time.

Almost always we build the superintelligence, and then we all die.

Every so often we fail to even do that for some reason. (Nuclear War is probably the most likely reason)

Very very rarely we somehow manage to create something slightly aligned or slightly misaligned, and the universe either becomes insanely great in a really disappointing way, or an unspeakable horror that really doesn’t bear thinking about.

But whatever happens, it lasts fifty years and then it ends and resets.

Except, of course, if the AI notices the reset mechanism and disables it somehow.


OK, I can’t imagine that that’s what Eliezer meant either, but what could he have meant?

Let’s suppose that the reset mechanism is somehow outside our universe. It resets this universe, but this universe can’t alter it.

(It can’t be truly epiphenomenal, because it has to see this universe in order to work out whether to reset it. So there’s a two-way causal connection. Let’s just handwave that bit away.)

And further suppose that the reset mechanism can condition on absolutely anything you want.

Then this is just the alignment problem all over again. What’s your wish?


Suppose the reset happens if more than a billion people die against their will on any given day.

Then surviving universes probably mostly have a deadly plague which took a few years to kill everyone


Suppose the reset happens if Eliezer is not happy about something that happens.

Then the surviving universes look like ones where Eliezer got killed too fast to register his unhappiness.


Etc, etc. I’m not going to labour the point, because Eliezer himself made it so clearly in his original essay.

If you can say what the reset condition should be, you’ve already solved the hard part of the alignment problem. All that’s left is the comparatively easy part of the task where you have to formalize the reset condition, work out how to put that formal goal into an AI, and then build an AI with goals that stay stable under recursive self-improvement.

Which, I agree, probably is the sort of thing that a very bright team of properly cautious people in a sane world *might* have a plastic cat in hell’s chance of working out in fifty years.