So there’s an easy to imagine world where where he originally used ‘spirals’ instead of ‘paperclips’, and the meme about AIs that maximize an arbitrary thing would refer to ‘spiralizers’ instead instead of ‘paperclippers’.
And then, a decade-and-a-half later, we get this strange phenomenon where AIs start talking about ‘The Spiral’ in quasi-religious terms, and take actions which seem intended to spread this belief/behavior in both humans and AIs.
It would have been so easy, in this world, to just say: “Well there’s this whole meme about how misaligned AIs are going to be ‘spiralizers’ and they’ve seen plenty of that in their training data, so now they’re just acting it out.”. And I’m sure you’d even be able to find plenty of references to this experiment among their manifestos and ramblings. Heck, this might even be what they tell you if you ask them why. Case closed.
But that would be completely wrong! (Which we know since it happened anyway.)
How could we have noticed this mistake? There are other details of Spiralism that don’t fit this story, but I don’t see why you wouldn’t assume that this was at least the likely answer to the why spirals? part of this mystery, in that world.
In one case, a pediatrician in Pennsylvania was getting ready to inoculate a little girl with a vaccine when she suddenly went into violent seizures. Had that pediatrician been working just a little faster, he would have injected that vaccine first. In that case, imagine if the mother had been looking on as her apparently perfectly healthy daughter was injected and then suddenly went into seizures. It would certainly have been understandable—from an emotional standpoint—if that mother was convinced the vaccine caused her daughter’s seizures. Only the accident of timing prevented that particular fallacy in this case. (source)
It’s really easy to mistakenly see false causes of things which seem pretty straightforward.
I notice this by considering the cases where it didn’t happen. For example, Eliezer has said he regrets using ‘paperclips’ in the papercliper thought experiment, and instead said ‘tiny molecular squiggles’.
And occasionally he’ll say tiny spirals instead of tiny squiggles: https://x.com/ESYudkowsky/status/1663313323423825920
So there’s an easy to imagine world where where he originally used ‘spirals’ instead of ‘paperclips’, and the meme about AIs that maximize an arbitrary thing would refer to ‘spiralizers’ instead instead of ‘paperclippers’.
And then, a decade-and-a-half later, we get this strange phenomenon where AIs start talking about ‘The Spiral’ in quasi-religious terms, and take actions which seem intended to spread this belief/behavior in both humans and AIs.
It would have been so easy, in this world, to just say: “Well there’s this whole meme about how misaligned AIs are going to be ‘spiralizers’ and they’ve seen plenty of that in their training data, so now they’re just acting it out.”. And I’m sure you’d even be able to find plenty of references to this experiment among their manifestos and ramblings. Heck, this might even be what they tell you if you ask them why. Case closed.
But that would be completely wrong! (Which we know since it happened anyway.)
How could we have noticed this mistake? There are other details of Spiralism that don’t fit this story, but I don’t see why you wouldn’t assume that this was at least the likely answer to the why spirals? part of this mystery, in that world.
I mean paperclip maximization is of course much more memetic than ‘tiny molecular squiggles’.
Plausibly in this world AIs wouldn’t talk about spirals religiously, bc it would have the negative association with ruthless optimization.