Thanks for the thoughtful reply. It took me a lot of squinting, but IIUC you’re saying:
Different kinds of minds, produced by different kinds of architectures, should likely exhibit very different levels of scary traits such as monomaniacal sociopathy.
Stop focusing on LLMs so much; they’re not the main threat. Yes they seem to exhibit more value-roundedness because they’re trained to imitate humans, but they aren’t likely to reach AGI anytime soon.
Focus more on RL agents and “brain-like” architectures; those are built very differently and plausibly would have much more sociopathic tendencies.
So, our alarm at the plausible risk of unleashing AGI-level ruthless monomaniacal RL agents is justified.
I don’t disagree with any of these, and your reply has helped me see the big implications of the LLM / RL distinction much more clearly. And I’m sorry for the vehemence of my earlier comment — part of that came from me not “getting” that my experience with LLMs may not apply to other AI architectures. I hadn’t previously gotten that into my head, so thank you.
But I want to push on the deeper framing I’m hearing; something in me reacts strongly (negatively) to your starting assumptions, in a way that doesn’t go away when you walk me through the object-level reasoning like above.
Your article treats general intelligence as “pure ruthless optimizer” by default, and any whiff of empathy or sociality or ruth is just a bolt-on module — a happy accident of our particular evolutionary pressures—and then you try to diagnose why human minds deviate from that sociopathic, amoral natural state. My objection, less passionately stated, is that this seems to privilege a theoretical framework as the baseline and then treat our only empirical data on AGI-level intelligence as the anomaly to be explained away. That feels deeply backwards to me. We have N=1 examples of human-level general intelligence. All 1 of them are deeply prosocial. The “agent foundations” framework predicts they shouldn’t be. Maybe the framework is justified and the data is misleading! But unless you state at the outset that this article is only for people who already treat “intelligence is inherently amoral” as self-evident, it’s a hell of an assumption to leave undefended.
And that’s really what I’m reacting to — not necessarily the framework itself (I’m not deep enough to seriously evaluate it), but how your article presents it. You jump straight to treating “general intelligence is by default asocial” as your starting axiom, without even a nod towards how ridiculous that sounds to someone who doesn’t already share it.* To someone outside the alignment-is-hard camp, it sounds like you’re saying: “Obviously, the natural state of a smart mind is psychopathy. Now let me propose a theory for why humans got lucky.” That is a wild premise to just breeze past! And yet the whole article is built on top of it as though it’s uncontroversial.
I’ll give one example of why it doesn’t feel uncontroversial to me. Think about what the actual x-risk nightmare scenarios involve: an entity that can deceive, manipulate, model human psychology with precision, coordinate complex strategies, communicate with other AI agents, negotiate complex trust boundaries, and outmaneuver entire civilizations of socially intelligent beings. That’s not a souped-up AlphaZero. The very capabilities that make the nightmare scary (deep, nuanced understanding of how humans think and feel and operate) are exactly the kind of rich social cognition that, in our only empirical examples, comes tangled up with perspective-taking, moral awareness, and empathy. Maybe those can be separated. Maybe you can build an entity that has all the social understanding and zero empathy. But that’s not self-evident to me, and your article seems to treat it as a given.
I’m not saying the framework is wrong. Maybe the alignment-is-hard camp has excellent reasons for treating “sociality” versus “social awareness” as discrete and not highly correlated. But as someone coming from outside that camp, I want to flag: the article reads like it’s written solely for people who already agree that asocial ruthless optimization is the natural default for human-level intelligence. For the rest of us, the framing doesn’t just fail to persuade, it might actively push us away, because it seems founded on a premise that flatly contradicts our lived experience of what minds are and how minds work. If part of your goal here is to bridge the gap between the two camps you describe, I think that gap starts right here, at that axiom.
* I know you acknowledge at the start that these two “camps” have conflicting intuitions which need to be reconciled. But then you immediately treat them asymmetrically: one intuition gets to be the self-evident framework, and the other one gets to be an interesting anomaly to account for. Acknowledging that normal humans find your (camp’s) premises alien isn’t the same as defending those premises. It just means you’re being politer about the assumption you’re making.
I’m again years late to the party, but there’s a couple things here that I want to respond to:
If I read between the lines, you seem to be suggesting “It’s not a strawman if you don’t take religious beliefs seriously. Non-believers have no obligation to care whether their critique-of-religion accurately represents the thing being critiqued.” If I’m misreading you, please tell me. But if that is your position, it’s *exactly opposite* the spirit of epistemic generosity that this article is trying to advocate, and that LessWrong overall shows such a strong commitment to. Rigorous debate means representing the opposing view as strongly and faithfully as possible, *especially* when you think it’s idiotic. You don’t earn the right to strawman someone’s claims (nor say that it’s only a strawman “from their perspective”) just because you find their belief system harmful and abhorrent.
Which points to a fun irony: this is an article about how people avoid their beliefs’ real weak points. My comment calls out what I see as Yudkowsky taking a low blow, departing from his admirably high argumentative standards, by importing a capability (eg mass teleportation, or everyone-simultaneously-dropping-dead-by-miracle) that religious sources don’t set precedent for, even though I agree with the broader point about God having more humane options and allegedly acting in ways that break my suspension of disbelief. Your response essentially amounts to: “It’s fine to be sloppy when criticizing religion because believers are sloppy too.” But that’s *exactly* the kind of motivated reasoning the article is warning against, just applied in the anti-religion direction. The article’s lesson is that epistemic honesty requires applying rigorous scrutiny to your own beliefs and arguments at least as much as you apply it to views you disagree with.
I’ll also note that my original comment wasn’t making “the morality angle.” I wasn’t defending God’s alleged behavior, honestly, he sounds like kind of a jerk. I was saying the criticism itself is sloppy on its own terms. Those are very different claims. Conflating them makes it harder to have the kind of honest, careful conversation that this site was built for.