Richard_Ngo

Karma: 20,032

Formerly alignment and governance researcher at DeepMind and OpenAI. Now independent.

Richard_Ngo 23 Oct 2025 21:43 UTC
11 points
8
in reply to: Noah Birnbaum’s comment on: Daniel Birnbaum’s Shortform
You should probably link some posts, it’s hard to discuss this so abstractly. And popular rationalist thinkers should be able to handle their posts being called mediocre (especially highly-upvoted ones).

Richard_Ngo 21 Oct 2025 22:00 UTC
2 points
0
in reply to: Nick_Tarleton’s comment on: Generalized Coming Out Of The Closet
I think there are other dynamics that are probably as important as ‘renouncing antisocial desires’ — in particular, something like ‘blocks to perceiving aspects of vanilla sex/sexuality’ (which can contribute to a desire for kink as nearest-unblocked-strategy)
This seems insightful and important!

Richard_Ngo 21 Oct 2025 20:31 UTC
2 points
0
in reply to: Zack_M_Davis’s comment on: 21st Century Civilization curriculum
Fixed, ty!

Richard_Ngo 21 Oct 2025 19:35 UTC
10 points
4
in reply to: Raemon’s comment on: 21st Century Civilization curriculum
Good question. I learned from my last curriculum (the AGI safety fundamentals one) that I should make my curricula harder than I instinctively want to. So I included a bunch of readings that I personally took a long time to appreciate as much as I do now (e.g. Hoffman on the debtor’s revolt, Yudkowsky on local validity, Sotala on beliefs as emotional strategies, Moses on The Germans in week 1). Overall I think there’s at least one reading per week that would reward very deep thought. Also I’m very near (and plausibly literally on) the global Pareto frontier in how much I appreciate all of MAGA-type politics, rationalist-type analysis, and hippie-type discussion of trauma, embodied emotions, etc. I’ve tried to include enough of all of these in there that very few people will consistently think “okay, I get it”.
Having said that, people kept recommending that I include books, and I kept telling them I couldn’t because I only want to give people 20k words max of main readings per week. Given a word budget it seems like people will learn more from reading many short essays than a few books. But maybe that’s an artifact of how I personally think (basically, I like to start as broad as possible and then triangulate my way down to specific truths), whereas other people might get more out of going deeper into fewer topics.
I do think that there’s not enough depth to be really persuasive to people who go in strongly disagreeing with me on some/all of these topics. My hope is that I can at least convey that there’s some shape of coherent worldview here, which people will find valuable to engage with even if they don’t buy it wholesale.

Richard_Ngo 21 Oct 2025 16:18 UTC
12 points
3
in reply to: cousin_it’s comment on: 21st Century Civilization curriculum
The threats of losing one’s job or getting evicted are not actually very scary when you’re in healthy labor and property markets. And we’ve produced so much technological abundance over the last century that our labor and property markets should be flourishing. So insofar as those things are still scary for people today, a deeper explanation for that comes in explaining why our labor and property markets arent very healthy, which comes back to our inability to build and our overrregulation.

But also: yes, there’s a bunch of stuff in this curriculum about exploitation by elites. Somehow there’s a strange pattern though where a lot of the elite exploitation is extremely negative-sum: e.g. so so much money is burned in the US healthcare system, not even transferred to elites (e.g. there are many ways in which being a doctor is miserable which you would expect a healthy system to get rid of). So I focused on paradigm examples of negative-sum problems in the intro to highlight that’s there’s definitely something very Pareto suboptimal going on here.

Richard_Ngo 20 Oct 2025 22:01 UTC
2 points
0
in reply to: J Bostock’s comment on: “Pessimization” is Just Ordinary Failure
My version of your synthesis is something like as follows:
This is closer; I’d just add that I don’t think activism is too different from other high-stakes domains, and I discuss it mainly because people seem to take activists more at face value than other entities. For example, I expect that law firms often pessimize their stated values (of e.g. respect for the law) but this surprises people less. More generally, when you experience a lot of internal conflict, every domain is an adversarial domain (against parts of yourself).
I think there’s a connection between the following
- storing passwords in plain text|encrypting passwords on a secure part of the disk|salting and hashing passwords
- naive reasoning|studying fallacies and biases|learning to recognise a robust world model
- utilitarianism|deontology|virtue ethics
I think you lost the italics somewhere. Some comments on these analogies:
- The idea that some types of cognition are “fallacies” or “biases” and others aren’t does seem like a pretty deontological way of thinking about the world, insofar as it implicitly claims that you can reason well just by avoiding fallacies and biases.
- As the third step in this analogy, instead of “learning to recognize a robust world model”, I’d put “carrying out internal compromises”, i.e. figuring out how to reduce conflict between heuristics and naive reasoning and other internal subagents.
- Re the passwords analogy: yes, deontology and virtue ethics are adversarially robust in a way that utilitarianism isn’t. But also, virtue ethics is scalable in a way that deontology isn’t, which seems well-captured by the distinction between storing passwords on secure disks vs salting and hashing them.

Richard_Ngo 19 Oct 2025 19:28 UTC
6 points
2
on: “Pessimization” is Just Ordinary Failure
Thanks for engaging! There’s a lot here I agree with—in particular, the concept of pessimization does seem like a dangerous one which could be used to demoralize people. I also think psychoanalyzing me is fair game here, and that it would be a big strike against the concept if I were using it badly.
I’m trying to figure out if there’s some underlying crux here, and the part that gets closest to it is maybe:
I think Richard makes an important error when he complains about existing activist-ish groups: he compares these groups to an imaginary version of the activist group which doesn’t make any mistakes. Richard seems to see all mistakes made by activist groups as unforced and indicative of deep problems or malice.
I don’t know how you feel about the concept of Moloch, but I think you could probably have written a pretty similar essay about that concept. In each individual case you could characterize a coordination failure as just an “ordinary failure”, rather than a manifestation of the larger pattern that constitutes Moloch. And indeed your paragraph above is strikingly similar to my own critique of the concept of Moloch, which basically argues that Scott is comparing existing coordination failures to an imaginary world which has perfect coordination. I’ve also made similar critiques of Eliezer’s concept of “civilizational inadequacy” as measuring down from perfection.
I think that the synthesis here is that neither pessimization nor Moloch nor “civilizational inadequacy” should be treated as sufficiently load-bearing that they should tell you what to do directly. In some sense all of these create awayness motivations: don’t pessimize, don’t be inadequate, don’t let Moloch win. But as Malcolm Ocean points out, awayness motivations are very bad for steering. If your guiding principle is not to be inadequate, then you will probably not dream very big. If your guiding principle is not to pessimize, then people will probably just throw accusations of pessimization at each other until everything collapses into a big mess.
That’s why I ended the post by talking about virtue ethics, and how it can be construed as a technology for avoiding pessimization. I want to end up in a place where people almost never say to each other “stop pessimizing”, they instead say “be virtuous”. But in order to argue for virtues as the solution to pessimization/the way to build the “imaginary version” of groups which don’t make such unforced errors, I need to first point at one of the big problems they’re trying to solve. It’s also worth noting that a major research goal of mine is to pin down mechanisms of pessimization more formally and precisely, and if I fail then that should count as a significant strike against the concept.
I’m not 100% sure that this is the right synthesis, and will need to muse on it more, but I appreciate your push to get this clearer in my head (and on LessWrong).
Lastly, at risk of turning this political, the one thing I’ll say about the “support Hamas” stuff is that there’s a spectrum of what counts as “support”, from literally signing up to fight for them to cheering them on to dogwhistling in support of them to just pushing for some of the same goals that they do to failing to condemn them. My contention is that there are important ways in which Hamas’ lack of alignment with western values leads to more western support for them—e.g. the wave of pro-Palestine rallies immediately after they killed many civilians—which is what makes this an example of pessimization. Of course this is a dangerous kind of accusation because there’s a lot of wiggle room in exactly what we mean by “lack of alignment”, and distinctions between supporting Hamas itself vs supporting associated causes. I personally still think the effect is stark enough that my core point was correct, but I should have phrased it more carefully. (Note: I edited this paragraph a few mins after writing it, because the original version wasn’t very thoughtful.)

Richard_Ngo 19 Oct 2025 9:00 UTC
2 points
0
in reply to: Eli Tyre’s comment on: Generalized Coming Out Of The Closet
Can you give some examples of people with vibrant will-to-Goodness?
My guess is that the people who are unusually disembodied that you’re thinking of probably suppress a kind of contempt and/or anger at other people who don’t have so much will-to-Goodness.

Richard_Ngo 18 Oct 2025 20:09 UTC
22 points
1
in reply to: Eli Tyre’s comment on: Generalized Coming Out Of The Closet
I think that properly understanding the psychology of BDSM might provide the key to understanding psychology in general (in ways that are pretty continuous with the insights of early pioneers of psychology, e.g. Freud and particularly Jung).
My current model is:
- The process of learning to be “good” typically involves renouncing and suppressing your “antisocial” desires, some of which are biologically ingrained (e.g. many aspects of male aggression) and some of which are learned idiosyncratically (e.g. having a traumatic childhood which teaches you that the world is zero-sum and you can only gain by hurting others). It also involves renouncing and suppressing parts of yourself which are “pathetic” or “weak” (e.g. the desire to not have to make any choices, the belief that you are bad and unworthy of existing).
- These desires/beliefs aren’t removed from your psyche (since internal subagents have strong survival instincts, making it difficult to fully destroy them) but rather coagulate into a “shadow”: a coalition of drives and desires which mostly remains hidden from your conscious thinking, but still influences your behavior in various ways. The influence of your shadow on your behavior is typically hard for you to detect yourself, but often easy for (emotionally intelligent) others to detect in you.
- People who have a very strong “will-to-Goodness” don’t necessarily have very strong/extreme shadows, but often do, because they created the very strong will-to-Goodness by strongly suppressing their antisocial desires, which then strongly polarized those desires.
- Many types of BDSM are a fairly straightforward manifestation of the desires in your shadow. Participating in BDSM can be good for one’s psyche in the sense that it represents a partial reconciliation with one’s shadow, reducing internal conflict. I.e. rather than having a shadow that’s fully repressed, you can have a “bargain” between your ego and your shadow that’s something like “the ego is (mostly) in charge almost all the time, while the shadow is (mostly) in charge during kinky sex”. It feels really somatically nice for parts of your psyche which are almost always repressed and shamed to be allowed to act for once.
- However, BDSM can also be bad for one’s psyche in the sense that positive reinforcement during BDSM causes your shadow to grow, thereby increasing internal conflict longer-term. Also, doing BDSM with others can cause their shadow to grow too. “Healthy” BDSM probably looks more like an outlet which gradually helps you to accept and integrate your shadow then move on, rather than a lifestyle or a part of your long-term identity. My guess is that BDSM communities end up instantiating similar “crab in a bucket” dynamics as incel communities—i.e. holding people back from developing healthier psychologies.
- Young children are rightly horrified by BDSM when they stumble upon it, because it’s an indication that there’s something twisted/perverse going on in the world. However, I suspect that almost all adults who feel horrified by BDSM are in part reacting to their own shadow. My guess is that the few people who have actually integrated their shadows in a healthy way are neither very interested in nor very horrified by BDSM, but rather mostly sad about it (like they’re sad about suffering more generally). When I say that they’ve “integrated” their shadows, I mean that their BDSM-like desires are cooperating strongly enough with their other desires that they’re a little bit present most of the time, rather than driving them to create simulacra of highly transgressive behavior. This might sound scary, but I expect that fully experiencing the ways in which we all have power over each other in normal life provides enough fodder to satisfy the BDSM-like desires in almost all of us. (For example, if you really allowed yourself to internalize how much power being a westerner gives you over people in developing countries, or the power dynamics in friendships where one person is more successful than the other, I expect that thought process to feel kinda like BDSM.)
- Trying to evoke and deal with your shadow is a difficult and fraught process, since (by definition) it involves grappling with the parts of yourself that you’re most ashamed about and most scared of giving control to. I recommend doing so gradually and carefully. My most direct engagement with shadow work was regrettably intense (analogous to a bad psychedelic trip) and came very close to having very bad effects on my life (though I’ve now wrestled those effects into a positive direction, and find shadow work very valuable on an ongoing basis).
- As you can probably infer, most of the points above are informed by my own past and ongoing experiences.

Richard_Ngo 17 Oct 2025 14:18 UTC
2 points
0
in reply to: Chris van Merwijk’s comment on: Agent foundations: not really math, not really science
This is the closest thing I have.
This is also relevant, about how the “alignment/capabilities” distinction is better understood as a “science/engineering” distinction.

Richard_Ngo 3 Oct 2025 13:22 UTC
4 points
1
on: Ethical Design Patterns
In my ontology “virtues” are ethical design patterns about how to make decisions.
I’m a virtue ethicist because I think that this kind of ethical design pattern is more important than ethical design patterns about what decisions to make (albeit with some complications that I’ll explore in some upcoming posts).
(Having said that, I feel some sense that I’m not going to use “ethical design patterns” very much going forward—it’s a little unwieldy as a phrase. I think I will just use “ethics”, by contrast with things like “altruism” which IMO are less well-understood as design patterns.)

Richard_Ngo 29 Sep 2025 13:40 UTC
4 points
0
in reply to: sunwillrise’s comment on: Agent foundations: not really math, not really science
Note that I’ve changed my position dramatically over the last few years, and now basically endorse something very close to what I was calling “rationality realism” (though I’d need to spend some time rereading the post to figure out exactly how close my current position is).
In particular, I think that we should be treating sociology, ethics and various related domains much more like we treat physics.
I also endorse this quote from a comment above, except that I wouldn’t call it “thinking studies” but maybe something more like “the study of intelligent agency” (and would add game theory as a central example):
there is a rich field of thinking-studies. it’s like philosophy, math, or engineering. it includes eg Chomsky’s work on syntax, Turing’s work on computation, Gödel’s work on logic, Wittgenstein’s work on language, Darwin’s work on evolution, Hegel’s work on development, Pascal’s work on probability, and very many more past things and very many more still mostly hard-to-imagine future things

Richard_Ngo 21 Sep 2025 11:25 UTC
37 points
5
in reply to: Neel Nanda’s comment on: Safety researchers should take a public stance
FWIW I used to agree with you but now agree with Nate. A big part of the update was developing a model of how “PR risks” work via a kind of herd mentality, where very few people are actually acting on their object-level beliefs, and almost everyone is just tracking what everyone else is tracking.
In such a setting, “internal influence” strategies tend to do very little long-term, and maybe even reinforce the taboo against talking honestly. This is roughly what seems to have happened in DC, where the internal influence approach was swept away by a big Overton window shift after ChatGPT. Conversely, a few principled individuals can have a big influence by speaking honestly (here’s a post about the game theory behind this).
In my own case, I felt a vague miasma of fear around talking publicly while at OpenAI (and to a lesser extent at DeepMind), even though in hindsight there were often no concrete things that I endorsed being afraid of—for example, there was a period where I was roughly indifferent about leaving OpenAI, but still scared of doing things that might make people mad enough to fire me.
I expect that there’s a significant inferential gap between us, so this is a hard point to convey, but one way that I might have been able to bootstrap my current perspective from inside my “internal influence” frame is to try to identify possible actions X such that, if I got fired for doing X, this would be a clear example of the company leaders behaving unjustly. Then even the possible “punishment” for doing X is actually a win.

Richard_Ngo 13 Sep 2025 6:43 UTC
22 points
0
in reply to: Ben Pace’s comment on: Obligated to Respond
“consistent with my position above I’d bet that in the longer term we’d do best to hit a button that ended all religions today, and then eat the costs and spend the decades/centuries required to build better things in their stead.”
Would you have pressed this button at every other point throughout history too? If not, when’s the earliest you would have pressed it?

Richard_Ngo 10 Sep 2025 11:10 UTC
3 points
0
in reply to: Jonas Hallgren’s comment on: ricraz’s Shortform
Good question. One answer is that my reset mechanisms involve cultivating empathy, and replacing fear with positive motivation. If I notice myself being too unempathetic or too fear-driven, that’s worrying.
But another answer is just that, unfortunately, the reality distortion fields are everywhere—and in many ways more prevalent in “mainstream” positions (as discussed in my post). Being more mainstream does get you “safety in numbers”—i.e. it’s harder for you to catalyze big things, for better or worse. But the cost is that you end up in groupthink.

Richard_Ngo 10 Sep 2025 7:43 UTC
7 points
−2
in reply to: Wei Dai’s comment on: ricraz’s Shortform
I like this comment.
For the sake of transparency, while in this post I’m mostly trying to identify a diagnosis, in the longer term I expect to try to do political advocacy as well. And it’s reasonable to expect that people like me who are willing to break the taboo for the purposes of diagnosis will be more sympathetic to ethnonationalism in their advocacy than people who aren’t. For example, I’ve previously argued on twitter that South Africa should have split into two roughly-ethnonationalist states in the 90s, instead of doing what they actually did.
However, I expect that the best ways of fixing western countries won’t involve very much ethnonationalism by historical standards, because it’s a very blunt tool. Also, I suspect that breaking the taboo now will actually lead to less ethnonationalism in the long term. For example, even a little bit more ethnonationalism would plausibly have made European immigration policies much less insane over the last few decades, which would then have prevented a lot of the political polarization we’re seeing today.

Richard_Ngo 10 Sep 2025 6:42 UTC
8 points
2
in reply to: the gears to ascension’s comment on: ricraz’s Shortform
This is a thoughtful comment, I appreciate it, and I’ll reply when I have more time (hopefully in a few days).

Richard_Ngo 10 Sep 2025 1:41 UTC
12 points
1
in reply to: testingthewaters’s comment on: ricraz’s Shortform
Thanks for the extensive comment. I’m not sure it’s productive to debate this much on the object level. The main thing I want to highlight is that this is a very good example of how the taboo that I discussed above operates.
On most issues, people (and especially LWers) are generally open to thinking about the benefits and costs of each stance, since tradeoffs are real.
However, in the case of ethnonationalism, even discussing the taboo on it (without explicitly advocating for it) was enough to trigger a kind of zero-tolerance attitude in your comment.
This is all the more striking because the main historical opponent of ethnonationalist regimes was globalist communism, which also led to large-scale atrocities. Yet when people defend a “socialist” or “egalitarian” cluster of ideas, that doesn’t lead to anywhere near this level of visceral response.
My main bid here is for readers to notice that there is a striking asymmetry in how we think about and discuss 20th century history, which is best explained via the thing I hypothesized above: a strong taboo on ethnonationalism in the wake of WW2, which has then distorted our ability to think about many other issues.

Richard_Ngo 10 Sep 2025 0:14 UTC
8 points
2
in reply to: testingthewaters’s comment on: ricraz’s Shortform
For the most obvious example, for the life of me I cannot understand how leaving the gold standard makes a culture less appreciative of any kind of moral virtue, unless you equate two very different senses of the word “value”.
Might reply to the rest later but just to respond to what you call “the most obvious example”: consider a company which has a difficult time evaluating how well its employees are performing (i.e. most of them). Some employees will work hard even when they won’t directly be rewarded for that, because they consider it virtuous to do so. However, if you then add to their team a bunch of other people who are rewarded for slacking off, the hard-working employees may become demotivated and feel like they’re chumps for even trying to be virtuous.
The extent to which modern governments hand out money causes a similar effect across western societies (edited: for example, if many people around you are receiving welfare, then working hard yourself is less motivating). They would not be as able to do this as much if their currencies were still on the gold standard, because it would be more obvious that they are insolvent.

Richard_Ngo 10 Sep 2025 0:06 UTC
2 points
1
in reply to: johnswentworth’s comment on: ricraz’s Shortform
I used to agree with your understanding but I am now more skeptical. For example, here’s a story that says the opposite:
The poorer humans are, the more vulnerable each human is to the group consensus. People who disagreed with groups could in the past easily be assaulted by mobs, or harassed in a way that led them to literally starvation-level wealth. Nowadays, though, even victims of extreme ‘cancel culture’ don’t face such risks, because society is wealthy enough that you can do things like move to a new city to avoid mobs, or get charities to feed and clothe you even if you lose your job.

Also it’s much harder to design parasitic egregores now than it used to be, because our science is much better and so we know many more facts, which makes it harder for egregores to lie.
I’m not saying my story is true, but it does highlight that the load-bearing question is actually something like “how does the offense-defense balance against parasitic egregores scale with wealth?” Why don’t we live in a world where wealth can buy a society defenses against such egregores?
Or maybe we do live in such a world, and we are just failing to buy those defenses. That seems like a really dumb situation to be in, but I think my post is broadly describing how it might arise.