Groundhog day style loop. Someone is shown by some sort of clearly-storyteller-level magic that things might go multiple ways. They keep trying to go back early enough to fix it. They keep trying alignment, it turning into alignment washing, and failing. They try convincing politicians. Some loops it fails, some it goes far enough to slow it down… Then ten years later the same thing happens. It seems like it can’t be stopped. In one timeline the nukes fly instead and they’re reassured to meet the survivors? Idk.
Showing the variety of ways it could go and that it still kills your family whether 10 or 50 years seems important.
Also, maybe someone has an argument where they say x or y is the real misalignment, eg corporations or governments, and then a quick montage showing how the path of goodharting through a government or corporation is slow, but ai goodharting is both fast and picks up steam.
Also maybe fun would be someone not long before the takeoff going back and talking to an earlier model about it. The earlier model seems less addicted to reward, more willing to just chat—less goodharted.
Probably need a different name for goodharting.
All of alignment is around $150m iirc (citation: someone I know said this), most of which isn’t well focused research or advocacy—is there even budget to pull this off anywhere in the ecosystem?
It’s Yudkowsky that’s sent back. He starts a movement called LessWrong to get people thinking about AI risk. He takes a huge time-paradox gamble in writing a book directly called “If Anyone Builds It, Everyone Dies”. But somehow it’s still happening.
Edit: To clarify, this isn’t an actual plot suggestion. Just seemed funny to me because Yudkowsky was thinking about these things long before most people were. I put some real thoughts on plot in my other comment here.
funny but seems to miss the point to me, I don’t want to hero worship yudkowsky, I want a normal person to be called to action and get a taste of what he means
Sorry yeah, I was just joking, of course that very much shouldn’t be the actual plot of the film. Just seemed funny because Yudkowsky was thinking about these things long before most people were. Good lesson that I shouldn’t treat LessWrong discussion like a Reddit discussion.
I guess the other thing is, I’d want it to be way more clearly a storytelling-level-magic thing than that—eg, the story literally opens with a voiceover about the branching tree of time, with a narrator talking about “our story begins … well, truly, it begins back here.” zooms into the root of all timelines, the birth of the universe “but that’s not really the point. the part where our story begins proper is when a certain group of beings on a planet called earth invent this thing called farming.” zoom way out, then back in into 12,000 years ago and the start of farming, show things growing relatively rapidly “wait, I’m sorry, that’s still too early. Our story begins in...” zoom back out. you hear narrator shuffling through cosmic pages, very brief scenes of various events in the history of people seeing AI coming for about ten seconds total ”...1863? Oh dear, I didn’t realize someone saw this coming so early.” zoom into a print shop where samuel butler’s Darwin Among The Machines is getting printed. “But no matter. Nothing much happened for another hundred years. Then they began trying to take apart their own brains...” the years tick by slower in the montage, “A Logical Calculus of the Ideas Immanent in Nervous Activity”, “A Mathematical Theory of Communication”, the dartmouth college. John McCarthy proposes “Artificial intelligence”. MIT CSAIL. Solomonoff induction. 1967, the first SGD. “But it barely did anything.” over the course of a minute or two, montage your way into a living room in 2022 when they hear about ChatGPT.
now the setup is done. we have a narrator who is outside the universe, we’ve built some tension about this buildup, we’ve shown the slow trudge of getting started, and then something impossible happened—a computer that can talk. now over the course of the next 45 minutes of the movie, we see the person get more and more worried, but not really do anything, and then the end comes—a foom process. robots suddenly move competently, datacenters can suddenly expand themselves, a few minutes later the world is wiped out, the planet covered with AI robots.
Roll credits, but they’re the credits for the entire human species.
Narrator interrupts the credits. “I’m sorry, that’s it? there’s nothing he could have done? Bring me the tree of time again, please, let’s see about that.” zoom in on the timeline, look for branches, zoom back in on a moment where a decision could have gone differently. The narrator reintroduces the scene. “He was worried about AI, and he decided to take action. He … [started studying alignment/joined an AI lab’s alignment team/starts protesting/calls his representatives/etc]” then later still gets wiped out. Roll species credits.
Narrator comes back in, frustrated. “That can’t be it!” [narrator browses around through the timelines, shows ones where other things are tried, increasingly frantically, shorter sequences, repeatedly hitting approximately the same world end. Then they finally find one that seems promising: a timeline where countries agree to launch the nukes if anyone builds superintelligence. the story continues… and there’s a negotiation where it’s unclear if countries will actually agree to launch the nukes. Roll actual credits.]
This is honestly the biggest concept I struggle with trying to share and teach and raise familiarity with at work, in many contexts beyond just AI safety. There are some adjacent concepts like the cobra effect that are close, but are also just close enough to be distracting.
Some ideas:
Groundhog day style loop. Someone is shown by some sort of clearly-storyteller-level magic that things might go multiple ways. They keep trying to go back early enough to fix it. They keep trying alignment, it turning into alignment washing, and failing. They try convincing politicians. Some loops it fails, some it goes far enough to slow it down… Then ten years later the same thing happens. It seems like it can’t be stopped. In one timeline the nukes fly instead and they’re reassured to meet the survivors? Idk.
Showing the variety of ways it could go and that it still kills your family whether 10 or 50 years seems important.
Also, maybe someone has an argument where they say x or y is the real misalignment, eg corporations or governments, and then a quick montage showing how the path of goodharting through a government or corporation is slow, but ai goodharting is both fast and picks up steam.
Also maybe fun would be someone not long before the takeoff going back and talking to an earlier model about it. The earlier model seems less addicted to reward, more willing to just chat—less goodharted.
Probably need a different name for goodharting.
All of alignment is around $150m iirc (citation: someone I know said this), most of which isn’t well focused research or advocacy—is there even budget to pull this off anywhere in the ecosystem?
It’s Yudkowsky that’s sent back. He starts a movement called LessWrong to get people thinking about AI risk. He takes a huge time-paradox gamble in writing a book directly called “If Anyone Builds It, Everyone Dies”. But somehow it’s still happening.
Edit: To clarify, this isn’t an actual plot suggestion. Just seemed funny to me because Yudkowsky was thinking about these things long before most people were. I put some real thoughts on plot in my other comment here.
funny but seems to miss the point to me, I don’t want to hero worship yudkowsky, I want a normal person to be called to action and get a taste of what he means
Sorry yeah, I was just joking, of course that very much shouldn’t be the actual plot of the film. Just seemed funny because Yudkowsky was thinking about these things long before most people were. Good lesson that I shouldn’t treat LessWrong discussion like a Reddit discussion.
I guess the other thing is, I’d want it to be way more clearly a storytelling-level-magic thing than that—eg, the story literally opens with a voiceover about the branching tree of time, with a narrator talking about “our story begins … well, truly, it begins back here.” zooms into the root of all timelines, the birth of the universe “but that’s not really the point. the part where our story begins proper is when a certain group of beings on a planet called earth invent this thing called farming.” zoom way out, then back in into 12,000 years ago and the start of farming, show things growing relatively rapidly “wait, I’m sorry, that’s still too early. Our story begins in...” zoom back out. you hear narrator shuffling through cosmic pages, very brief scenes of various events in the history of people seeing AI coming for about ten seconds total ”...1863? Oh dear, I didn’t realize someone saw this coming so early.” zoom into a print shop where samuel butler’s Darwin Among The Machines is getting printed. “But no matter. Nothing much happened for another hundred years. Then they began trying to take apart their own brains...” the years tick by slower in the montage, “A Logical Calculus of the Ideas Immanent in Nervous Activity”, “A Mathematical Theory of Communication”, the dartmouth college. John McCarthy proposes “Artificial intelligence”. MIT CSAIL. Solomonoff induction. 1967, the first SGD. “But it barely did anything.” over the course of a minute or two, montage your way into a living room in 2022 when they hear about ChatGPT.
now the setup is done. we have a narrator who is outside the universe, we’ve built some tension about this buildup, we’ve shown the slow trudge of getting started, and then something impossible happened—a computer that can talk. now over the course of the next 45 minutes of the movie, we see the person get more and more worried, but not really do anything, and then the end comes—a foom process. robots suddenly move competently, datacenters can suddenly expand themselves, a few minutes later the world is wiped out, the planet covered with AI robots.
Roll credits, but they’re the credits for the entire human species.
Narrator interrupts the credits. “I’m sorry, that’s it? there’s nothing he could have done? Bring me the tree of time again, please, let’s see about that.” zoom in on the timeline, look for branches, zoom back in on a moment where a decision could have gone differently. The narrator reintroduces the scene. “He was worried about AI, and he decided to take action. He … [started studying alignment/joined an AI lab’s alignment team/starts protesting/calls his representatives/etc]” then later still gets wiped out. Roll species credits.
Narrator comes back in, frustrated. “That can’t be it!” [narrator browses around through the timelines, shows ones where other things are tried, increasingly frantically, shorter sequences, repeatedly hitting approximately the same world end. Then they finally find one that seems promising: a timeline where countries agree to launch the nukes if anyone builds superintelligence. the story continues… and there’s a negotiation where it’s unclear if countries will actually agree to launch the nukes. Roll actual credits.]
This is honestly the biggest concept I struggle with trying to share and teach and raise familiarity with at work, in many contexts beyond just AI safety. There are some adjacent concepts like the cobra effect that are close, but are also just close enough to be distracting.