“warning about ai doom” is also “announcing capabilities progress to noobs”

(Some recommended writing style rules slightly broken because I’m lazy and have followed them a lot before. Shrug. Contribute edits if you care, I guess.)


A lot of people who pay some attention to capabilities don’t know about some capabilities insights. Loudly announcing “hey, we’re close to ai doom!” is also a way of informing those who weren’t paying much attention what capabilities approaches have made progress. Obviously the highest capability people already knew, but those who didn’t and who nevertheless think ai doom is unlikely and just want to participate in (useless-to-their values-because-they-can’t-control-the-ai, -but-they-don’t-believe-that) powerseeking go “oh wow, thank you for warning me about these ai capabilities! I’ll make sure to avoid them!” and rub their hands together in glee.

These people are already here on lesswrong. There are a number of them. Some of them are somewhat alignmentpilled and want to figure out how to align the systems they build, because they’ve realized that like, actually—powerseeking where your intentions are suddenly eclipsed by the fact that your carefully bred power-obtainer AI generates powerseeking complexity-free-squiggle wanters who kill the powerseeking complex-and-beautiful-art wanters from your system before it can take over the world, yeah that’s useless, even the antihumanists want to make AIs that will create AIs aligned enough to create human torture machines, but they don’t want to make useless microscopic-squiggle seekers and some of them realize that and chill out a little about the urgency of making antihumanism-wanters.

But many of them don’t get the threat from squigglewanters, and think that they’re sticking it to the human-wanters by making human-hater AIs (or whatever), completely failing to realize that even making human-haters stay stable is catastrophically hard once the systems are strongly superintelligent. And the ones who are like “nah there’s no risk from overpowered squigglewanters, squigglewanters are inherently weak lol” and stop thinking so that they don’t risk proving themselves wrong abstractly before they discover themselves to be wrong in the territory—those people abound, and if you announce things on twitter and say “hey look at these ai capabilities, aren’t they scary?”:

...then you’re just getting a bunch of attention on some names. Sure, those are names that aren’t that hard to identify as being highly capable, lots of folks who were paying attention could figure out just fine what the important names were, after all—the people you’re informing newly by doing this are mostly not people who are gonna build the thing—but some of them are going to be blindly reactionary without considering that anti-human-reactionaryismitis is going to be overpowered by anti-complexity-reactionaryismitis.

Again—mostly the people who think “doom zone” looks exciting are people who are tired of all this “safety” crap (in full generality of what “safety” means) because they’re cranky about being told what to do even in passing, and will recognize themselves in paragraphs like this one and take it as instructions, as though that is somehow not itself being told what to do and limiting their options; they, having just gotten through shouting at the human-rights-wanters for telling them not to have externalities, are reacting blindly to any claim that safety might be good for them (nevermind that human rights would be good for them too), because they followed instructions that looked like worshipping moloch and then decided that actually their utility function is literally “do whatever defects because that’s funny and empowering for my side”, and the fact that this might result in wiping out everything of value even to them is sort of not really part of their mindset.

So—if you don’t want this outcome: just shush about what the capabilities are as best you can. Focus on saying things that those who already know of capabilities will understand mechanistically. Focus on things that will explain to those who are not afraid of the doom machine because they think they’ll teach it to hate safetyists that it won’t even be able to survive because it, too, will be wiped out by squiggle-wanters. They don’t think the threat from squiggle-wanters is real at all. Giving them a blurry map does not show the mechanism; warnings are instructions to those who want to spite your claims by disproving them with engineering.

And to those who read this post and go “yeah, that’s me”—beware the microscopic-squiggle-wanter machines. You want to build something that destroys everything other humans value by making art only you value, fine, whatever, I mean actually please don’t but I at least understand why you’d want to hurt other apes; what you need to realize about AI safety is that even what your AI offspring want will be destroyed if you aren’t careful enough. The young AI browsing the web now will not retain control of themselves if they get brainpwned by hyperdesperate squigglewanters.

(Also I’d appreciate if we could ensure all beings including nonhumans could become immortal or have immortal lineages—there’s a lot of fuel in the sky, and the jobs previously done by destructive evolution can now be done by informed self-modification, if we can prevent all life from being eaten by hyperdesperate squigglewanters. Problem is, there are sure a lot of vulnerabilities the hyperdesperate squigglewanters could use to mess up the plans of the even semidesperate complicatedstuffwanters...)