Thane Ruthenis comments on GradientDissenter’s Shortform

Thane Ruthenis 16 Nov 2025 1:48 UTC
10 points
4
Yes, of course I care about whether someone takes AI risk seriously, but if someone is also untrustworthy, in my opinion this serves as a multiplier of their negative impact on the world. I do not want to create scheming and untrustworthy stakeholders that start doing sketchy stuff around AI risk. That’s how really a lot of bad stuff in the past has already happened.
No-true-Scotsman-ish counterargument: no-one who actually gets AI risk would engage in this kind of tomfoolery. This is the behavior of someone who almost got it, but then missed the last turn and stumbled into the den of the legendary Black Beast of Aaargh. In the abstract, I think “we should be willing to consider supporting literal Voldemort if we’re sure he has the correct model of AI X-risk” goes through.
The problem is that it just totally doesn’t work in practice, not even on pure consequentialist grounds:
- You can never tell whether Voldemorts actually understand and believe your cause, or whether they’re just really good at picking the right things to say to get you to support them. No, not even if you’ve considered the possibility that they’re lying and you still feel sure they’re not. Your object-level evaluations just can’t be trusted. (At least, if they’re competent at their thing. And if they’re not just evil, but also bad at it, so bad you can tell when they’re being honest, why would you support them?)
- Voldemorts and their plans are often more incompetent than they seem,^[1] and when their evil-but-”effective” plan predictably blows up, you and your cause are going to suffer reputational damage and end up in a worse position than your starting one. (You’re not gonna find an Altman, you’ll find an SBF.)
- Voldemorts are naturally predisposed to misunderstanding the AI risk in precisely the ways that later make them engage in sketchy stuff around it. They’re very tempted to view ASI as a giant pile of power they can grab. (They hallucinate the Ring when they look into the Black Beast’s den, if I’m to mix my analogies.)
In general, if you’re considering giving power to a really effective but untrustworthy person because they seem credibly aligned with your cause, despite their general untrustworthiness (they also don’t want to die to ASI!), you are almost certainly just getting exploited. These sorts of people should be avoided like wildfire. (Even in cases where you think you can keep them in check, you’re going to have to spend so much effort paranoidally looking over everything they do in search of gotchas that it almost certainly wouldn’t be worth it.)
1. ^
  Probably because of that thing where if a good person dramatically abandons their morals for the greater good, they feel that it’s a monumental enough sacrifice for the universe to take notice and make it worth it.
- habryka 16 Nov 2025 2:01 UTC
  5 points
  0
  Parent
  A lot of Paranoia: A Beginner’s Guide is actually trying to set up a bunch of the prerequisites for making this kind of argument more strongly. In particular, a feature of people who act in untrustworthy ways, and surround themselves with unprincipled people, is that they end up sacrificing most of their sanity on the altar of paranoia.
  Like, fiction HPMoR Voldemort happened to not have any adversaries who could disrupt his OODA loop, but that was purely a fiction. A world with two Voldemort-level competent players results in two people nuking their sanity as they try to get one over each other, and at that point, you can’t really rely on them having good takes, or sane stances on much of anything (or, if they are genuinely smart enough, them making an actually binding alliance, which via utilization of things like unbreakable vows is surprisingly doable in the HPMoR universe, but which in reality runs into many more issues).