(2) he was worried there might be some variant on Roko’s argument that worked, and he wanted more formal assurances that this wasn’t the case;
I don’t think we are in disagreement here.
There are lots of good reasons Eliezer shouldn’t have banned Roko discussion of the basilisk, but I don’t think this is one of them. If the basilisk was a real concern, that would imply that talking about it put people at risk of torture, so this is an obvious example of a topic you initially discuss in private channels and not on public websites.
The basilisk could be a concern only if an AI that would carry out such type of blackmail was built. Once Roko discovered it, if he thought it was a plausible risk, then he had a selfish reason to prevent such AI from being built. But even if he was completely selfless, he could reason that somebody else could think of that argument, or something equivalent, and make it public, hence it was better sooner than later, allowing more time to prevent that design failure.
Also I’m not sure what private channles you are referring to. It’s not like there is a secret Google Group of all potential AGI designers, is there? Privately contacting Yudkowsky or SIAI/SI/MIRI wouldn’t have worked. Why would Roko trust them to handle that information correctly? Why would he believe that they had leverage over or even knowledge about arbitrary AI projects that might end up building an AI with that particular failure mode? LessWrong was at that time the primary forum for discussing AI safety issues. There was no better place to raise that concern.
Roko’s original argument, though, could have been stated in one sentence: ‘Utilitarianism implies you’ll be willing to commit atrocities for the greater good; CEV is utilitarian; therefore CEV is immoral and dangerous.’
It wasn’t just that. It was an argument against utilitarianism AND a decision theory that allowed to consider “acausal” effects (e.g. any theory that one-boxes in Newcomb’s problem). Since both utilitarianism and one-boxing were popular positions on LessWrong, it was reasonable to discuss their possible failure modes on LessWrong.
I don’t think we are in disagreement here.
The basilisk could be a concern only if an AI that would carry out such type of blackmail was built. Once Roko discovered it, if he thought it was a plausible risk, then he had a selfish reason to prevent such AI from being built. But even if he was completely selfless, he could reason that somebody else could think of that argument, or something equivalent, and make it public, hence it was better sooner than later, allowing more time to prevent that design failure.
Also I’m not sure what private channles you are referring to. It’s not like there is a secret Google Group of all potential AGI designers, is there?
Privately contacting Yudkowsky or SIAI/SI/MIRI wouldn’t have worked. Why would Roko trust them to handle that information correctly? Why would he believe that they had leverage over or even knowledge about arbitrary AI projects that might end up building an AI with that particular failure mode?
LessWrong was at that time the primary forum for discussing AI safety issues. There was no better place to raise that concern.
It wasn’t just that. It was an argument against utilitarianism AND a decision theory that allowed to consider “acausal” effects (e.g. any theory that one-boxes in Newcomb’s problem). Since both utilitarianism and one-boxing were popular positions on LessWrong, it was reasonable to discuss their possible failure modes on LessWrong.