A few misconceptions surrounding Roko’s basilisk

There’s a new LWW page on the Roko’s basilisk thought ex­per­i­ment, dis­cussing both Roko’s origi­nal post and the fal­lout that came out of Eliezer Yud­kowsky ban­ning the topic on Less Wrong dis­cus­sion threads. The wiki page, I hope, will re­duce how much peo­ple have to rely on spec­u­la­tion or re­con­struc­tion to make sense of the ar­gu­ments.

While I’m on this topic, I want to high­light points that I see omit­ted or mi­s­un­der­stood in some on­line dis­cus­sions of Roko’s basilisk. The first point that peo­ple writ­ing about Roko’s post of­ten ne­glect is:

  • Roko’s ar­gu­ments were origi­nally posted to Less Wrong, but they weren’t gen­er­ally ac­cepted by other Less Wrong users.

Less Wrong is a com­mu­nity blog, and any­one who has a few karma points can post their own con­tent here. Hav­ing your post show up on Less Wrong doesn’t re­quire that any­one else en­dorse it. Roko’s ba­sic points were promptly re­jected by other com­menters on Less Wrong, and as ideas not much seems to have come of them. Peo­ple who bring up the basilisk on other sites don’t seem to be su­per in­ter­ested in the spe­cific claims Roko made ei­ther; dis­cus­sions tend to grav­i­tate to­ward var­i­ous older ideas that Roko cited (e.g., time­less de­ci­sion the­ory (TDT) and co­her­ent ex­trap­o­lated vo­li­tion (CEV)) or to­ward Eliezer’s con­tro­ver­sial mod­er­a­tion ac­tion.

In July 2014, David Auer­bach wrote a Slate piece crit­i­ciz­ing Less Wrong users and de­scribing them as “freaked out by Roko’s Basilisk.” Auer­bach wrote, “Believ­ing in Roko’s Basilisk may sim­ply be a ‘refer­en­dum on autism’” — which I take to mean he thinks a sig­nifi­cant num­ber of Less Wrong users ac­cept Roko’s rea­son­ing, and they do so be­cause they’re autis­tic (!). But the Auer­bach piece glosses over the ques­tion of how many Less Wrong users (if any) in fact be­lieve in Roko’s basilisk. Which seems some­what rele­vant to his ar­gu­ment...?

The idea that Roko’s thought ex­per­i­ment holds sway over some com­mu­nity or sub­cul­ture seems to be part of a mythol­ogy that’s grown out of at­tempts to re­con­struct the origi­nal chain of events; and a big part of the blame for that mythol­ogy’s ex­is­tence lies on Less Wrong’s mod­er­a­tion poli­cies. Be­cause the dis­cus­sion topic was banned for sev­eral years, Less Wrong users them­selves had lit­tle op­por­tu­nity to ex­plain their views or ad­dress mis­con­cep­tions. A stew of ru­mors and partly-un­der­stood fo­rum logs then con­gealed into the at­tempts by peo­ple on Ra­tion­alWiki, Slate, etc. to make sense of what had hap­pened.

I gather that the main rea­son peo­ple thought Less Wrong users were “freaked out” about Roko’s ar­gu­ment was that Eliezer deleted Roko’s post and banned fur­ther dis­cus­sion of the topic. Eliezer has since sketched out his thought pro­cess on Red­dit:

When Roko posted about the Basilisk, I very fool­ishly yel­led at him, called him an idiot, and then deleted the post. [...] Why I yel­led at Roko: Be­cause I was caught flat­footed in sur­prise, be­cause I was in­dig­nant to the point of gen­uine emo­tional shock, at the con­cept that some­body who thought they’d in­vented a brilli­ant idea that would cause fu­ture AIs to tor­ture peo­ple who had the thought, had promptly posted it to the pub­lic In­ter­net. In the course of yel­ling at Roko to ex­plain why this was a bad thing, I made the fur­ther er­ror—keep­ing in mind that I had ab­solutely no idea that any of this would ever blow up the way it did, if I had I would ob­vi­ously have kept my fingers quies­cent—of not mak­ing it ab­solutely clear us­ing lengthy dis­claimers that my yel­ling did not mean that I be­lieved Roko was right about CEV-based agents [= Eliezer’s early model of in­di­rectly nor­ma­tive agents that rea­son with ideal ag­gre­gated prefer­ences] tor­tur­ing peo­ple who had heard about Roko’s idea. [...] What I con­sid­ered to be ob­vi­ous com­mon sense was that you did not spread po­ten­tial in­for­ma­tion haz­ards be­cause it would be a crappy thing to do to some­one. The prob­lem wasn’t Roko’s post it­self, about CEV, be­ing cor­rect.

This, ob­vi­ously, was a bad strat­egy on Eliezer’s part. Look­ing at the op­tions in hind­sight: To the ex­tent it seemed plau­si­ble that Roko’s ar­gu­ment could be mod­ified and re­paired, Eliezer shouldn’t have used Roko’s post as a teach­ing mo­ment and loudly chas­tised him on a pub­lic dis­cus­sion thread. To the ex­tent this didn’t seem plau­si­ble (or ceased to seem plau­si­ble af­ter a bit more anal­y­sis), con­tin­u­ing to ban the topic was a (demon­stra­bly) in­effec­tive way to com­mu­ni­cate the gen­eral im­por­tance of han­dling real in­for­ma­tion haz­ards with care.

On that note, point num­ber two:

  • Roko’s ar­gu­ment wasn’t an at­tempt to get peo­ple to donate to Friendly AI (FAI) re­search. In fact, the op­po­site is true.

Roko’s origi­nal ar­gu­ment was not ‘the AI agent will tor­ture you if you don’t donate, there­fore you should help build such an agent’; his ar­gu­ment was ‘the AI agent will tor­ture you if you don’t donate, there­fore we should avoid ever build­ing such an agent.’ As Ger­ard noted in the en­su­ing dis­cus­sion thread, threats of tor­ture “would mo­ti­vate peo­ple to form a bloodthirsty pitch­fork-wield­ing mob storm­ing the gates of SIAI [= MIRI] rather than con­tribute more money.” To which Roko replied: “Right, and I am on the side of the mob with pitch­forks. I think it would be a good idea to change the cur­rent pro­posed FAI con­tent from CEV to some­thing that can’t use nega­tive in­cen­tives on x-risk re­duc­ers.”

Roko saw his own ar­gu­ment as a strike against build­ing the kind of soft­ware agent Eliezer had in mind. Other Less Wrong users, mean­while, re­jected Roko’s ar­gu­ment both as a rea­son to op­pose AI safety efforts and as a rea­son to sup­port AI safety efforts.

Roko’s ar­gu­ment was fairly dense, and it con­tinued into the dis­cus­sion thread. I’m guess­ing that this (in com­bi­na­tion with the temp­ta­tion to round off weird ideas to the near­est re­li­gious trope, plus mi­s­un­der­stand­ing #1 above) is why Ra­tion­alWiki’s ver­sion of Roko’s basilisk gets in­tro­duced as

a fu­tur­ist ver­sion of Pas­cal’s wa­ger; an ar­gu­ment used to try and sug­gest peo­ple should sub­scribe to par­tic­u­lar sin­gu­lar­i­tar­ian ideas, or even donate money to them, by weigh­ing up the prospect of pun­ish­ment ver­sus re­ward.

If I’m cor­rectly re­con­struct­ing the se­quence of events: Sites like Ra­tion­alWiki re­port in the pas­sive voice that the basilisk is “an ar­gu­ment used” for this pur­pose, yet no ex­am­ples ever get cited of some­one ac­tu­ally us­ing Roko’s ar­gu­ment in this way. Via cito­ge­n­e­sis, the claim then gets in­cor­po­rated into other sites’ re­port­ing.

(E.g., in Outer Places: “Roko is claiming that we should all be work­ing to ap­pease an om­nipo­tent AI, even though we have no idea if it will ever ex­ist, sim­ply be­cause the con­se­quences of defy­ing it would be so great.” Or in Busi­ness In­sider: “So, the moral of this story: You bet­ter help the robots make the world a bet­ter place, be­cause if the robots find out you didn’t help make the world a bet­ter place, then they’re go­ing to kill you for pre­vent­ing them from mak­ing the world a bet­ter place.”)

In terms of ar­gu­ment struc­ture, the con­fu­sion is equat­ing the con­di­tional state­ment ‘P im­plies Q’ with the ar­gu­ment ‘P; there­fore Q.’ Some­one as­sert­ing the con­di­tional isn’t nec­es­sar­ily ar­gu­ing for Q; they may be ar­gu­ing against P (based on the premise that Q is false), or they may be ag­nos­tic be­tween those two pos­si­bil­ities. And mis­re­port­ing about which ar­gu­ment was made (or who made it) is kind of a big deal in this case: ‘Bob used a bad philos­o­phy ar­gu­ment to try to ex­tort money from peo­ple’ is a much more se­ri­ous charge than ‘Bob owns a blog where some­one once posted a bad philos­o­phy ar­gu­ment.’


  • “For­mally speak­ing, what is cor­rect de­ci­sion-mak­ing?” is an im­por­tant open ques­tion in philos­o­phy and com­puter sci­ence, and for­mal­iz­ing pre­com­mit­ment is an im­por­tant part of that ques­tion.

Mov­ing past Roko’s ar­gu­ment it­self, a num­ber of dis­cus­sions of this topic risk mis­rep­re­sent­ing the de­bate’s genre. Ar­ti­cles on Slate and Ra­tion­alWiki strike an in­for­mal tone, and that tone can be use­ful for get­ting peo­ple think­ing about in­ter­est­ing sci­ence/​philos­o­phy de­bates. On the other hand, if you’re go­ing to dis­miss a ques­tion as unim­por­tant or weird, it’s im­por­tant not to give the im­pres­sion that work­ing de­ci­sion the­o­rists are similarly dis­mis­sive.

What if your dev­as­tat­ing take-down of string the­ory is in­tended for con­sump­tion by peo­ple who have never heard of ‘string the­ory’ be­fore? Even if you’re sure string the­ory is hog­wash, then, you should be wary of giv­ing the im­pres­sion that the only peo­ple dis­cussing string the­ory are the com­menters on a recre­ational physics fo­rum. Good re­port­ing by non-pro­fes­sion­als, whether or not they take an ed­i­to­rial stance on the topic, should make it ob­vi­ous that there’s aca­demic dis­agree­ment about which ap­proach to New­comblike prob­lems is the right one. The same holds for dis­agree­ment about top­ics like long-term AI risk or ma­chine ethics.

If Roko’s origi­nal post is of any ped­a­gog­i­cal use, it’s as an un­suc­cess­ful but imag­i­na­tive stab at draw­ing out the di­verg­ing con­se­quences of our cur­rent the­o­ries of ra­tio­nal­ity and goal-di­rected be­hav­ior. Good re­sources for these is­sues (both for dis­cus­sion on Less Wrong and el­se­where) in­clude:

The Roko’s basilisk ban isn’t in effect any­more, so you’re wel­come to di­rect peo­ple here (or to the Roko’s basilisk wiki page, which also briefly in­tro­duces the rele­vant is­sues in de­ci­sion the­ory) if they ask about it. Par­tic­u­larly low-qual­ity dis­cus­sions can still get deleted (or po­litely dis­cour­aged), though, at mod­er­a­tors’ dis­cre­tion. If any­thing here was un­clear, you can ask more ques­tions in the com­ments be­low.