Richard_Ngo comments on Wei Dai’s Shortform

Richard_Ngo 15 Nov 2025 23:47 UTC
4 points
0
This has pretty low argumentative/persuasive force in my mind.
Note that my comment was not optimized for argumentative force about the overarching point. Rather, you asked how they “can” still benefit the world, so I was trying to give a central example.
In the second half of this comment I’ll give a couple more central examples of how virtues can allow people to avoid the traps you named. You shouldn’t consider these to be optimized for argumentative force either, because they’ll seem ad-hoc to you. However, they might still be useful as datapoints.
Figuring out how to describe the underlying phenomenon I’m pointing at in a compelling, non-ad-hoc way is one of my main research focuses. The best I can do right now is to say that many of the ways in which people produce outcomes which are harmful (by their own lights) seem to arise from a handful of underlying dynamics. I call this phenomenon pessimization. One way in which I’m currently thinking about virtues is as a set of cognitive tools for preventing pessimization. As one example, kindness and forgiveness help to prevent cycles of escalating conflict with others, which is a major mechanism by which people’s values get pessimized. This one is pretty obvious to most people; let me sketch out some less obvious mechanisms below.
what if someone isn’t smart enough to come up with a new line of illegible research, but does see some legible problem with an existing approach that they can contribute to? What would cause them to avoid this?
This actually happened to me: when I graduated from my masters I wasn’t cognitively capable of coming up with new lines of illegible alignment research, in part because I was too status-seeking. Instead I went to work at DeepMind, and ended up spending a lot of my time working on RLHF, which is a pretty central example of a “legible” line of research.
However, I also wasn’t cognitively capable of making much progress on RLHF, because I couldn’t see how it addressed the core alignment problem, and so it didn’t seem fundamental enough to maintain my interest. Instead I spent most of my time trying to understand the alignment problem philosophically (resulting in this sequence) at the expense of my promotion prospects.
In this case I think I had the virtue of deep curiosity, which steered my attention towards illegible problems even though my top-down plan was to contribute to alignment by doing RLHF research. These days, whatever you might think of my research, few people complain that it’s too legible.
There are other possible versions of me who had that deep curiosity but weren’t smart enough to have generated a research agenda like my current one; however, I think they would still have left DeepMind, or at least not been very productive on RLHF.
And even the hypothetical virtuous person who starts doing illegible research on their own, what happens when other people catch up to him and the problem becomes legible to leaders/policymakers? How would they know to stop working on that problem and switch to another problem that is still illegible?
When a field becomes crowded, there’s a pretty obvious inference that you can make more progress by moving to a less crowded field. I think people often don’t draw that inference because moving to a less crowded field loses them prestige, is emotionally/financially risky, etc. Virtues help remove those blockers.
What links here?
- Richard_Ngo's comment on leogao’s Shortform by leogao (30 Jan 2026 2:17 UTC; 5 points)