Fabien Roger comments on Powerful misaligned AIs may be extremely persuasive, especially absent mitigations

Fabien Roger 18 Jan 2026 8:15 UTC
3 points
0
What level of capabilities are we talking about here? I think this story is plausible when AIs are either superintelligent at persuasion, or superintelligent at decision making, but otherwise I’d be surprised if it was a big deal (low confidence).
The basic reason is that I expect AIs to be in a position similar to technical advisors (who are often quite misaligned, e.g. emphasizing threats from their field more than is warranted, or having certain kinds of political biases) wrt to decision makers. These human advisors do have quite a bit of influence, but I would not describe them as “extremely persuasive”. Human technical advisors not being very trustworthy and politicians not trusting them enough / too much is a problem, but it has not been deadly to democracies, and I expect issues with AI persuasion to be roughly similar as long as the situation does not become so crazy that decision makers have to massively defer to advisors (in particular because decision makers stop being able to understand things at all, even with the help of adversarial advisors, and because the simple default actions become catastrophic).
Also on the very biggest issues (e.g. “are AIs misaligned”, “are AIs conscious”) I suspect AI developers will not just let the AIs “say what they want”, which will reduce the damage AIs can cause.
What is the concrete thing you are imagining AIs convincing people of that would be catastrophic and that would not have happened with regular human advisors?
(The problem in this domain I do worry about is “will we make ASI policy/strategy advisors which we can mostly defer to when humans become too weak to follow and react to the intelligence explosion”, and maybe it captures what you mean by “persuasion”. I think calling it “the problem of making trustworthy ASI policy/strategy advisors” is more descriptive and clear, while calling it “persuasion” points at the wrong thing.)
- Cody Rushing 18 Jan 2026 18:43 UTC
  3 points
  0
  Parent
  Thanks for engaging and disagreeing
  What level of capabilities are we talking about here?
  I wrote this article in particular with the AI models from the AI 2027 TTX that “outperform humans at every remote work task and accelerate AI R&D by 100x” in mind. It depends on how spiky the capabilities are, but it feels reasonable to assume that these AIs are at least as good as the best humans at decision-making and persuasion, if not better. (If you felt like neither of these would be true for this model, then I’m fine with also applying these to a more powerful AI which hit this criteria, but I would be surprised if these capabilities came late enough to be irrelevant to shaping the dynamics of the intelligence explosion and beyond. Though given your last paragraph, that might be cruxy.)
  But I’m not even sure the ‘persuasion’ capability needs to be that good. A lot the persuasion ultimately routed through i) the AI gaining lots of trust from legibly succeeding and being useful to decision-makers in a broad set of domains and ii) the AIs having lots of influence and control over people’s cognition (such as information control).
  Human technical advisors not being very trustworthy and politicians not trusting them enough / too much is a problem, but it has not been deadly to democracies
  I think under the picture you described in the second paragraph it would be less surprising if AI didn’t have a large impact, but I think it’s likely the AI are more like a strong combination of primary executive assistant and extremely competent, trust advisor. I think they will have more reach and capabilities than human technical advisors, and end up being competitive or better than them at decision-making, and ultimately end up supporting/automating many parts of the decision maker’s cognition. One such control I imagine the AI has that is important is that they will be a middleman in a lot of the information that you see, such as being able to summarize documents or conversations or particular. (I expect for the incentives to do this will keep increasing.)
  My guess is that the reason why misaligned human advisors have not be deadly to democracy has been because it has been difficult for them to influence many other decision makers and because the human decision maker still did nearly all of the important thinking related to ‘how true is what they are saying’ / ‘what are their incentives’ / engaging with people who disagreed / etc (and there were better incentives to do so). I think this changes whenever you introduce strong AIs, and additionally, I think these AIs will be better at playing to the many things I listed in the section “Humans believe what they are incentivized to, and the incentives will be to believe the AIs.”
  Also on the very biggest issues (e.g. “are AIs misaligned”, “are AIs conscious”) I suspect AI developers will not just let the AIs “say what they want”, which will reduce the damage AIs can cause.
  Yup, though I don’t have strong beliefs on if they will succeed at this enough to sufficiently reduce the damage this could cause, especially in worst-case alignment regimes.
  What is the concrete thing you are imagining AIs convincing people of that would be catastrophic and that would not have happened with regular human advisors?
  I don’t have particular guesses, but there are probably many decisions that are made that can influence how AI turns out that are not individually catastrophic but catastrophic together, such as the example I gave of distrusting an individual AI advisor. I also wouldn’t rule out particular small sets of very important decisions that are made that would make it much harder for AI to go well; the example I gave in the article of mkaing the human fire/distrust specific advisors feels salient as an example.
  - Fabien Roger 19 Jan 2026 22:27 UTC
    3 points
    0
    Parent
    Ok that makes sense, you are thinking about AIs that are somewhat smarter than the ones I spend the most time thinking (you are describing things in the Agent-4 - Agent-5 range, while I often imagine using control for AIs in the Agent-3 - Agent-4 range). I agree that for those the temptation to defer on every strategy question would likely be big. Though I would describe the problem as “temptation to defer on every strategy question due to AI competence”, not “persuasion”. I think it’s weird to have a post titled “powerful AIs might be extremely persuasive” that does not hinge much on AI persuasion capabilities.