Epistemic Status: Rant. Very rapidly written and upon reflection uncertain if I fully endorse; Cunningham’s Law says that this is the best way to get good takes quickly.
Rationalists should win. If you have contorted yourself into alternative decision theories that leave you vulnerable to Roko’s Basilisk or whatever, and normal CDT or whatever actual humans implement in real life wouldn’t leave you vulnerable to stuff like this, then you have failed and you need to go back to trying to be a normal person using normal decision procedures instead of mathing your way into being “forever acausally tortured by a powerful intelligent robot.
If the average Joe on the street would not succumb to their mind being hacked by Eliezer Yudkowsky, or hell, by a late 2022 chatbot, and you potentially would (by virtue of being a part of the reference class of LessWrong users or whatever)—then you have failed and it is not obvious you can make an expected positive contribution to the field of AI risk reduction at all without becoming far more, for lack of a better word, normal. I don’t understand how people think that spending your time working on increasingly elaborate pseudophilosophical things that they then call “AI alignment” works if they are also the type of people who are highly vulnerable to getting mindhacked by ChatGPT—perhaps this is a bucket error or I’m attacking a strawman? I don’t think Eliezer or Nate or whatever would fall to this failure mode but in general the more philosophical parts of alignment to me feel worrying (and specifically I mean the MIRI-CFAR-sphere, although again maybe worried about attacking a strawman), because the potential negatives of “having people close to alignment solutions be unusually vulnerable to being hacked by AI.”
This list is quite good—https://mecfsroadmap.altervista.org/ Feel free to DM me if you want to chat more.