Formerly alignment and governance researcher at DeepMind and OpenAI. Now independent.
Richard_Ngo
Note that I’ve changed my position dramatically over the last few years, and now basically endorse something very close to what I was calling “rationality realism” (though I’d need to spend some time rereading the post to figure out exactly how close my current position is).
In particular, I think that we should be treating sociology, ethics and various related domains much more like we treat physics.
I also endorse this quote from a comment above, except that I wouldn’t call it “thinking studies” but maybe something more like “the study of intelligent agency” (and would add game theory as a central example):
there is a rich field of thinking-studies. it’s like philosophy, math, or engineering. it includes eg Chomsky’s work on syntax, Turing’s work on computation, Gödel’s work on logic, Wittgenstein’s work on language, Darwin’s work on evolution, Hegel’s work on development, Pascal’s work on probability, and very many more past things and very many more still mostly hard-to-imagine future things
FWIW I used to agree with you but now agree with Nate. A big part of the update was developing a model of how “PR risks” work via a kind of herd mentality, where very few people are actually acting on their object-level beliefs, and almost everyone is just tracking what everyone else is tracking.
In such a setting, “internal influence” strategies tend to do very little long-term, and maybe even reinforce the taboo against talking honestly. This is roughly what seems to have happened in DC, where the internal influence approach was swept away by a big Overton window shift after ChatGPT. Conversely, a few principled individuals can have a big influence by speaking honestly (here’s a post about the game theory behind this).
In my own case, I felt a vague miasma of fear around talking publicly while at OpenAI (and to a lesser extent at DeepMind), even though in hindsight there were often no concrete things that I endorsed being afraid of—for example, there was a period where I was roughly indifferent about leaving OpenAI, but still scared of doing things that might make people mad enough to fire me.
I expect that there’s a significant inferential gap between us, so this is a hard point to convey, but one way that I might have been able to bootstrap my current perspective from inside my “internal influence” frame is to try to identify possible actions X such that, if I got fired for doing X, this would be a clear example of the company leaders behaving unjustly. Then even the possible “punishment” for doing X is actually a win.
“consistent with my position above I’d bet that in the longer term we’d do best to hit a button that ended all religions today, and then eat the costs and spend the decades/centuries required to build better things in their stead.”
Would you have pressed this button at every other point throughout history too? If not, when’s the earliest you would have pressed it?
Good question. One answer is that my reset mechanisms involve cultivating empathy, and replacing fear with positive motivation. If I notice myself being too unempathetic or too fear-driven, that’s worrying.
But another answer is just that, unfortunately, the reality distortion fields are everywhere—and in many ways more prevalent in “mainstream” positions (as discussed in my post). Being more mainstream does get you “safety in numbers”—i.e. it’s harder for you to catalyze big things, for better or worse. But the cost is that you end up in groupthink.
I like this comment.
For the sake of transparency, while in this post I’m mostly trying to identify a diagnosis, in the longer term I expect to try to do political advocacy as well. And it’s reasonable to expect that people like me who are willing to break the taboo for the purposes of diagnosis will be more sympathetic to ethnonationalism in their advocacy than people who aren’t. For example, I’ve previously argued on twitter that South Africa should have split into two roughly-ethnonationalist states in the 90s, instead of doing what they actually did.
However, I expect that the best ways of fixing western countries won’t involve very much ethnonationalism by historical standards, because it’s a very blunt tool. Also, I suspect that breaking the taboo now will actually lead to less ethnonationalism in the long term. For example, even a little bit more ethnonationalism would plausibly have made European immigration policies much less insane over the last few decades, which would then have prevented a lot of the political polarization we’re seeing today.
This is a thoughtful comment, I appreciate it, and I’ll reply when I have more time (hopefully in a few days).
Thanks for the extensive comment. I’m not sure it’s productive to debate this much on the object level. The main thing I want to highlight is that this is a very good example of how the taboo that I discussed above operates.
On most issues, people (and especially LWers) are generally open to thinking about the benefits and costs of each stance, since tradeoffs are real.
However, in the case of ethnonationalism, even discussing the taboo on it (without explicitly advocating for it) was enough to trigger a kind of zero-tolerance attitude in your comment.
This is all the more striking because the main historical opponent of ethnonationalist regimes was globalist communism, which also led to large-scale atrocities. Yet when people defend a “socialist” or “egalitarian” cluster of ideas, that doesn’t lead to anywhere near this level of visceral response.
My main bid here is for readers to notice that there is a striking asymmetry in how we think about and discuss 20th century history, which is best explained via the thing I hypothesized above: a strong taboo on ethnonationalism in the wake of WW2, which has then distorted our ability to think about many other issues.
For the most obvious example, for the life of me I cannot understand how leaving the gold standard makes a culture less appreciative of any kind of moral virtue, unless you equate two very different senses of the word “value”.
Might reply to the rest later but just to respond to what you call “the most obvious example”: consider a company which has a difficult time evaluating how well its employees are performing (i.e. most of them). Some employees will work hard even when they won’t directly be rewarded for that, because they consider it virtuous to do so. However, if you then add to their team a bunch of other people who are rewarded for slacking off, the hard-working employees may become demotivated and feel like they’re chumps for even trying to be virtuous.
The extent to which modern governments hand out money causes a similar effect across western societies (edited: for example, if many people around you are receiving welfare, then working hard yourself is less motivating). They would not be as able to do this as much if their currencies were still on the gold standard, because it would be more obvious that they are insolvent.
I used to agree with your understanding but I am now more skeptical. For example, here’s a story that says the opposite:
The poorer humans are, the more vulnerable each human is to the group consensus. People who disagreed with groups could in the past easily be assaulted by mobs, or harassed in a way that led them to literally starvation-level wealth. Nowadays, though, even victims of extreme ‘cancel culture’ don’t face such risks, because society is wealthy enough that you can do things like move to a new city to avoid mobs, or get charities to feed and clothe you even if you lose your job.
Also it’s much harder to design parasitic egregores now than it used to be, because our science is much better and so we know many more facts, which makes it harder for egregores to lie.I’m not saying my story is true, but it does highlight that the load-bearing question is actually something like “how does the offense-defense balance against parasitic egregores scale with wealth?” Why don’t we live in a world where wealth can buy a society defenses against such egregores?
Or maybe we do live in such a world, and we are just failing to buy those defenses. That seems like a really dumb situation to be in, but I think my post is broadly describing how it might arise.
That’s a mechanism by which I might overestimate the support for Hamas. But the thing I’m trying to explain is the overall alignment between leftists and Hamas, which is not just a twitter bubble thing (e.g. see university encampments).
More generally, leftists profess many values which are upheld the most by western civilization (e.g. support for sexual freedom, women’s rights, anti-racism, etc). But then in conflicts they often side specifically against western civilization. This seems like a straightforward example of pessimization.
Consequentialism and utility functions or policies could in principle be about virtues and integrity as about hamburgers, but hamburgers are more legible and easier to administer.
Here’s one concrete way in which this isn’t true: one common simplifying assumption in economics is that goods are homogeneous, and therefore that you’re indifferent about who to buy from. However, virtuous behavior involves rewarding people you think are more virtuous (e.g. by preferentially buying things from them).
In other words, economics is about how agents interact with each other via exchanging goods and services, while virtues are about how agents interact with each other more generally.
Sufficiently different versions of yourself are just logically uncorrelated with you and there is no game-theoretic reason to account for them.
Seems odd to make an absolute statement here. More different versions of yourself are less and less correlated, but there’s still some correlation. And UDT should also be applicable to interactions with other people, who are typically different from you in a whole bunch of ways.
there’s often no internal conflict when someone is caught up in some extreme form of the morality game
Belated reply, sorry, but I basically just think that this is false—analogous to a dictator who cites parades where people are forced to attend and cheer as evidence that his country lacks internal conflict. Instead, the internal conflict has just been rendered less legible.
In the subagents frame, I would say that the subagents have an implicit contract/agreement that any one of them can seize control, if doing so seems good for the overall agent in terms of power or social status.
Note that this is an extremely non-robust agent design! In particular, it allows subagents to gain arbitrary amounts of power simply by lying about their intentions. If you encounter an agent which considers itself to be structured like this, you should have a strong prior that it is deceiving itself about the presence of more subtle control mechanisms.
Crossposted from Twitter:
This year I’ve been thinking a lot about how the western world got so dysfunctional. Here’s my rough, best-guess story:
1. WW2 gave rise to a strong taboo against ethnonationalism. While perhaps at first this taboo was valuable, over time it also contaminated discussions of race differences, nationalism, and even IQ itself, to the point where even truths that seemed totally obvious to WW2-era people also became taboo. There’s no mechanism for subsequent generations to create common knowledge that certain facts are true but usefully taboo—they simply act as if these facts are false, which leads to arbitrarily bad policies (e.g. killing meritocratic hiring processes like IQ tests).
2. However, these taboos would gradually have lost power if the west (and the US in particular) had maintained impartial rule of law and constitutional freedoms. Instead, politicization of the bureaucracy and judiciary allowed them to spread. This was enabled by the “managerial revolution” under which govt bureaucracy massively expanded in scope and powers. Partly this was a justifiable response to the increasing complexity of the world (and various kinds of incompetence and nepotism within govts) but in practice it created a class of managerial elites who viewed their intellectual merit as license to impose their ideology on the people they governed. This class gains status by signaling commitment to luxury beliefs. Since more absurd beliefs are more costly-to-fake signals, the resulting ideology is actively perverse (i.e. supports whatever is least aligned with their stated core values, like Hamas).
3. On an ideological level the managerial revolution was facilitated by a kind of utilitarian spirit under which technocratic expertise was considered more important for administrators than virtue or fidelity to the populace. This may have been a response to the loss of faith in traditional elites after WW1. The enlightened liberal perspective wanted to maintain a fiction of equality, under which administrators were just doing a job the same as any other, rather than taking on the heavy privileges and responsibilities associated with (healthy) hierarchical relationships.
4. On an economic level, the world wars led to centralization of state power over currency and the abandonment of the gold standard. While at first govts tried to preserve the fiction that fiat currencies were relevantly similar to gold-backed currencies, again there was no mechanism for later generations to create common knowledge of what had actually been done and why. The black hole of western state debt that will never be repaid creates distortions across the economy, which few economists actually grapple with because they are emotionally committed to thinking of western govts as “too big to fail”.
5. All of this has gradually eroded the strong, partly-innate sense of virtue (and respect for virtuous people) that used to be common. Virtue can be seen as a self-replicating memeplex that incentivizes ethical behavior in others—e.g. high-integrity people will reward others for displaying integrity. This is different from altruism, which rewards others regardless of their virtue. Indeed, it’s often directly opposed to altruism, since altruists disproportionately favor the least virtuous people (because they’re worse-off). Since consequentialists think that morality is essentially about altruism, much moral philosophy actively undermines ethics. So does modern economics, via smuggling in the assumption that utility functions represent selfish preferences.
6. All of this is happening against a backdrop of rapid technological progress, which facilitates highly unequal control mechanisms (e.g. a handful of people controlling global newsfeeds or AI values). The bad news is that this enables ideologies to propagate even when they are perverse and internally dysfunctional. The good news is that it makes genuine truth-seeking and virtuous cooperation increasingly high-leverage.
Addenda:I led with the ethnonationalism stuff because it’s the most obvious, but in some sense it’s just a symptom: a functional society would have rejected the taboos when they got too obviously wrong (e.g. by defending Murray).
The deeper issue seems to be a kind of toxic egalitarianism that is against accountability, hierarchy or individual agency in general. You can trace this thread (with increasing uncertainty) thru e.g. Wilson, Marx, the utilitarians, and maybe even all the way back to Jesus.
Michael Vassar thinks of it as Germanic “kultur” (as opposed to “zivilization”); I’m not well-read enough to evaluate that claim though. I’m more confident about it being driven by fear-based motivations, especially envy—as per Girard, Lacan, etc.
Some prescriptions I’m currently considering:
- reviving virtue ethics
- AI-based tools for facilitating small, high-trust, high-accountability groups. Even if we can’t have freedom of association or reliable arbitration via legal or corporate mechanisms, perhaps we can still have it via social mechanisms (especially as more and more people become functionally post-economic)
- better therapeutic interventions, especially oriented to resolving fear of deathBut I spend most of my time trying to figure out the formal theory that encodes these intuitions—in which agents are understood in terms of goals (in the predictive processing sense) and boundaries rather than utility functions and credences. That feels upstream of a lot of other stuff. More here, though it’s a bit out of date.
Edited to add: I am surprised both by the extent of disagree-voting (-38 as of writing this), and by the extent to which this is decoupled from karma (23 as of writing this). This is an impressive level of decoupling. Given that the gap between my views and most LWers is much bigger than I thought, I’ll have a think about how to better convey my perspective in a way that makes cruxes clearer. Since many LWers believe in something like Eliezer’s civilizational inadequacy thesis, though, I’m curious about the best explanations other people have for why our current civilization is “inadequate”.
A related post I wrote recently.
+1 to ChristianKl’s observation below though that Geoffrey Miller is unrepresentative of MAGA because he’s already part of the broader AI safety community.
You might be interested in this post of mine which makes some related claims.
(Interested to read your post more thoroughly but for now have just skimmed it and not sure when I’ll find time to engage more.)
FWIW your writings on neuroscience are a central example of “real thinking” in my mind—it seems like you’re trying to actually understand things in a way that’s far less distorted by social pressures and incentives than almost any other writing in the field.
Reading this post led me to find a twitter thread arguing (with a bunch of examples):
One of the curious things about von Neumann was his ability to do extremely impressive technical work while seemingly missing all the big insights.
I then responded to it with my own thread arguing:
I’d even go further—I think we’re still recovering from Von Neumann’s biggest mistakes:
1. Implicitly basing game theory on causal decision theory
2. Founding utility theory on the independence axiom
3. Advocating for nuking the USSR as soon as possibleI’m not confident in my argument, but it suggests the possibility that von Neumann’s concern about his legacy was tracking something important (though, even if so, it’s unlikely that feeling insecure was a good response).
If someone predicts in advance that something is obviously false, and then you come to believe that it’s false, then you should update not just towards thought processes which would have predicted that the thing is false, but also towards thought processes which would have predicted that the thing is obviously false. (Conversely, if they predict that it’s obviously false, and it turns out to be true, you should update more strongly against their thought processes than if they’d just predicted it was false.)
IIRC Eliezer’s objection to bioanchors can be reasonably interpreted as an advance prediction that “it’s obviously false”, though to be confident I’d need to reread his original post (which I can’t be bothered to do right now).
In my ontology “virtues” are ethical design patterns about how to make decisions.
I’m a virtue ethicist because I think that this kind of ethical design pattern is more important than ethical design patterns about what decisions to make (albeit with some complications that I’ll explore in some upcoming posts).
(Having said that, I feel some sense that I’m not going to use “ethical design patterns” very much going forward—it’s a little unwieldy as a phrase. I think I will just use “ethics”, by contrast with things like “altruism” which IMO are less well-understood as design patterns.)