Make a Movie Showing Alignment Failures

My previous girlfriend didn’t want to live forever when I asked her years ago; not even 200 years as a healthy adult! But we watched an episode of Black Mirror where people can be temporarily uploaded into a fun virtual world and permanently stay there when they’re near-death. Surprisingly, she said she would like to do that instead of dying a natural death.

Some things are only convincing when you think about it in concrete details, and fiction allows people to live through an experience in fine-grained details. I don’t think my previous girlfriend would ever have been convinced of extending life to 200 years if she wasn’t presented with a clear concrete story where it ended well.

We can use fiction to show alignment failures to create a better cultural reference point than Terminator or Sorcerer’s Apprentice. We can do it and have it funded by FTX (Project #33) to actually make a movie/​series.

But humanity has to win, right? But how do we show alignment failures, let alone multiple alignment failures if humanity has to win?

With time loops. And everybody loves time loops.

The Plot

Similar to Dave Scum, Alice is just living her life until everyone dies, but time freezes and she can re-wind up to 1 month in the past. After enough time loops, she realizes it’s an AGI optimizing for [X] and she convinces the developers (after several more time loops) to implement a patch so it doesn’t kill everyone the way it always does.

So it then kills everyone in the next unblocked strategy.

After a montage of 10-15 patches, everyone dies except Alice who’s being kept alive by the AGI (which figured out she was a direct cause of patching the AI’s top-10 instrumental paths), which she then figures out how to activate the time-power w/​o dying through [the power of friendship/​true love] and swears off patches as a potential solution.

She then goes for the strategy of “buying more time” and convinces the developers to halt progress on this until they develop a more robust solution. This buys 3 more months until the world ends from a different company’s AGI. She repeatedly tries to convince groups of people until one group will not be convinced despite her best efforts. This buys her a total of 5 years every loop.

Alice begins trying to tackle the core problems with several groups of people, bringing back their results through time, trying out more robust solutions every few years, until finally, they produce an AI that performs a pivotal act, maybe something like the world of 17776, but hopefully better.

After a long reflection of [1000 years], humanity w/​ AI assistants solve human ethics. Alice goes back to the beginning of her timeline and begins writing up the code for a recursively improving AGI in order to save everyone who died in the previous timeline and to reduce astronomical waste.

Then they all lived happily ever after; the end.

Next Steps

  1. Actually writing up a script or basic story (not me!)

  2. Curating a list of failed alignment proposals (all of them so far) which will be shown to fail.

  3. How to engage CCP-backed researchers? We do not want the movie/​series to be banned in China if we hope to coordinate a common narrative with the CCP, so what’s the best way to actually make this acceptable (and popular) in China?

It actually doesn’t have to be popular, just entertaining and illustrates the failure of the ~10 most common alignment proposals that everyone thinks of and, for a bonus, the failure of more clever proposals in a believable fashion. We really just want to send it to a small set of researchers if it’s too hard to market/​make popular.

Creating a Common Utopia

Another useful fictional production is a shared utopia (or pivotal act that produces that shared utopia) that the major AI researchers would agree would be pretty good. If successful, we can reduce race-to-AGI conditions since everyone is pursuing the same end goal. Current utopias include (spoils the ending of certain fictions):