Wei Dai comments on Wei Dai’s Shortform

Wei Dai 28 Apr 2025 17:04 UTC
51 points
12
Reassessing heroic responsibility, in light of subsequent events.

I think @cousin_it made a good point “if many people adopt heroic responsibility to their own values, then a handful of people with destructive values might screw up everyone else, because destroying is easier than helping people” and I would generalize it to people with biased beliefs (which is often downstream of a kind of value difference, i.e., selfish genes).

It seems to me that “heroic responsibility” (or something equivalent but not causally downstream of Eliezer’s writings) is contributing to the current situation, of multiple labs racing for ASI and essentially forcing the AI transition on humanity without consent or political legitimacy, each thinking or saying that they’re justified because they’re trying to save the world. It also seemingly justifies or obligates Sam Altman to fight back when the OpenAI board tried to fire him, if he believed the board was interfering with his mission.

Perhaps “heroic responsibility” makes more sense if overcoming bias was easy, but in a world where it’s actually hard and/or few people are actually motivated to do it, which we seem to live in, spreading the idea of “heroic responsibility” seems, well, irresponsible.
What links here?
- Max H's comment on Heroic Responsibility by johnswentworth (5 Nov 2025 0:37 UTC; 17 points)
- Ben Pace 28 Apr 2025 23:37 UTC
  51 points
  20
  Parent
  My sense is that most of the people with lots of power are not taking heroic responsibility for the world. I think that Amodei and Altman intend to achieve global power and influence but this is not the same as taking global responsibility. I think, especially for Altman, the desire for power comes first relative to responsibility. My (weak) impression is that Hassabis has less will-to-power than the others, and that Musk has historically been much closer to having responsibility be primary.
  I don’t really understand this post as doing something other than asking “on the margin are we happy or sad about present large-scale action” and then saying that the background culture should correspondingly praise or punish large-scale action. Which is maybe reasonable, but alternatively too high level of a gloss. As per the usual idea of rationality, I think whether you are capable of taking large-scale action in a healthy way is true in some worlds and not in others, and you should try to figure out which world you’re in.
  The financial incentives around AI development are blatantly insanity-inducing on the topic and anyone should’ve been able to guess that going in, I don’t think this was a difficult question. Though I guess someone already exceedingly wealthy (i.e. already having $1B or $10B) could have unusually strong reason to not be concerned about that particular incentive (and I think it is the case Musk has seemed differently insane than the others taking action in this area, and lacking in some of the insanities).
  However I think most moves around wielding this level of industry should be construed as building an egregore more powerful than you. The founders/CEOs of the AI big-tech companies are not able to simple turn their companies off, nor their industry. If they grow to believe their companies are bad for the world, either they’ll need to spend many years dismantling / redirecting them, or else they’ll simply quit/move on and some other person will take their place. So it’s still default-irresponsible even if you believe you can maintain personal sanity.
  Overall I think taking responsibility for things is awesome and I wish people were doing more of it and trying harder. And I wish people took ultimate responsibility for as big of a thing they can muster. This is not the same as “trying to pull the biggest lever you can” or “reaching for power on a global level”, those are quite different heuristics; grabbing power can obviously just cost you sanity, and often those pulling the biggest lever they can are doing so foolishly.
  As a background model, I think if someone wants to take responsibility for some part of the world going well, by-default this does not look like “situating themselves in the center of legible power”. Lonely scientist/inventor James Watt spent his early years fighting poverty before successfully inventing better steam engines, and had far more influence by helping cause the industrial revolution than most anyone in government did during his era. I think confusing “moving toward legible power” for “having influence over the world” is one of the easiest kinds of insanity.
  - Wei Dai 1 May 2025 0:40 UTC
    5 points
    0
    Parent
    
    My sense is that most of the people with lots of power are not taking heroic responsibility for the world. I think that Amodei and Altman intend to achieve global power and influence but this is not the same as taking global responsibility. I think, especially for Altman, the desire for power comes first relative to responsibility. My (weak) impression is that Hassabis has less will-to-power than the others, and that Musk has historically been much closer to having responsibility be primary.
    
    Can you expand on this? How can you tell the difference, and does it make much of a difference in the end (e.g., if most people get corrupted by power regardless of initial intentions)?
    
    As a background model, I think if someone wants to take responsibility for some part of the world going well, by-default this does not look like “situating themselves in the center of legible power”.
    
    And yet, Eliezer, the writer of “heroic responsibility” is also the original proponent of “build a Friendly AI to take over the world and make it safe”. If your position is that “heroic responsibility” is itself right, but Eliezer and others just misapplied it, that seems to imply we need some kind of post-mortem on what went wrong with trying to apply the concept, and how future people can avoid making the same mistake. My guess is that like other human biases, it’s hard to avoid making this mistake even if you point it out to people or try other ways to teach people to avoid it, because the drive for status and power is deep-seated, because it has a strong evolutionary logic.
    
    (My position is, let’s not spread ideas/approaches that will predictably be “misused”, e.g., as justification for grabbing power, similar to how we shouldn’t develop AI that will predictably be “misused”, even if nominally “aligned” in some sense.)
    - Ben Pace 1 May 2025 1:17 UTC
      5 points
      0
      Parent
      Can you expand on this? How can you tell the difference, and does it make much of a difference in the end (e.g., if most people get corrupted by power regardless of initial intentions)?
      But I don’t believe most people get corrupted by power regardless of initial intentions? I don’t think Francis Bacon was corrupted by power, I don’t think James Watt was corrupted by power, I don’t think Stanislav Petrov was corrupted by power, and all of these people had far greater influence over the world than most people who are “corrupted by power”.
      I’m hearing you’d be interested in me saying more words about the difference in what it looks like to be motivated by responsibility versus power-seeking. I’ll say some words, can see if they help.
      I think someone motivated by responsibility often will end up looking more aligned with their earlier self over time even as they grow and change, will often not accept opportunities for a lot of power/prestige/money because they’re uninteresting to them, will often make sacrifices of power/prestige for ethical reasons, will pursue a problem they care about long after most would give up or think it likely to be solved.
      I think someone primarily seeking power will be much more willing to do things that pollute the commons or break credit-allocation mechanisms to get credit, and generally game a lot of systems that other people are earnestly rising through. They will more readily pivot on what issue they say they care about or are working on because they’re not attached to the problem, but to the reward for solving the problem, and many rewards can be gotten from lots of different problems. They’ll be more guided by what’s fashionable right now, and more attuned to it. They’ll maneuver themselves in order to be able to politically work with whoever has power that they want, regardless of the ethics/competence/corruption of those people.
      > As a background model, I think if someone wants to take responsibility for some part of the world going well, by-default this does not look like “situating themselves in the center of legible power”.
      And yet, Eliezer, the writer of “heroic responsibility” is also the original proponent of “build a Friendly AI to take over the world and make it safe”.
      Building an AGI doesn’t seem to me like a very legible mechanism of power, or at least it didn’t in the era Eliezer pursued it (where it wasn’t also credibly “a path to making billions of dollars and getting incredible prestige”). The word ‘legible’ was doing a lot of work in the sentence I wrote.
      Another framing I sometimes look through (H/T Habryka) is constrained vs unconstrained power. Having a billion dollars is unconstrained power, because you can use it to do a lot of different things – buy loads of different companies or resources. Being an engineer overlooking missile-defense systems in the USSR is very constrained, you have an extremely well-specified set of things you can control. This changes the adversarial forces on you, because in the former case a lot of people stand to gain a lot of different possible things they want if they can get leverage over you, and they have to be concerned about a lot of different ways you could be playing them. So the pressures for insanity are higher. Paths that give you the ability to influence very specific things that route through very constrained powers are less insanity-inducing, I think, and I think most routes that look like “build a novel invention in a way that isn’t getting you lots of money/status along the way” are less insanity-inducing, and I rarely find the person to have become as insane as some of the tech-company CEOs have. I also think people motivated by taking responsibility for fixing a particular problem in the world are more likely to take constrained power, because… they aren’t particularly motivated by all the other power they might be able to get.
      I don’t suspect I addressed your cruxes here so far about whether this idea of heroic responsibility is/isn’t predictably misused. I’m willing to try again if you wish, or if you can try pointing again to what you’d guess I’m missing.
  - testingthewaters 29 Apr 2025 15:13 UTC
    1 point
    0
    Parent
    Well said. Bravo.
- ryan_greenblatt 28 Apr 2025 17:11 UTC
  16 points
  3
  Parent
  I’m also uncertain about the value of “heroic responsibility”, but this downside consideration can be mostly addressed by “don’t do things which are highly negative sum from the perspective of some notable group” (or other anti-unilateralist curse type intuitions). Perhaps this is too subtle in practice.
  - Wei Dai 28 Apr 2025 18:14 UTC
    4 points
    2
    Parent
    If humans can’t easily overcome their biases or avoid having destructive values/beliefs, then it would make sense to limit the damage through norms and institutions (things like informed consent, boards, separation of powers and responsibilities between branches of government). Heroic responsibility seems antithetical to group-level solutions, because it implies that one should ignore norms like “respect the decisions of boards/judges” if needed to “get the job done”, and reduces social pressure to follow such norms (by giving up the moral high ground from which one could criticize such norm violations).
    
    You’re suggesting a very different approach, of patching heroic responsibility with anti-unilateralist curse type intuitions (on the individual level) but that’s still untried and seemingly quite risky / possibly unworkable. Until we have reason to believe that the new solution is an improvement to the existing ones, it still seems irresponsible to spread an idea that damages the existing solutions.
    What links here?
    Wei Dai's comment on Wei Dai’s Shortform by Wei Dai (28 Apr 2025 18:29 UTC; 4 points)
    - ryan_greenblatt 28 Apr 2025 18:56 UTC
      2 points
      2
      Parent
      Hmm, I’m not sure that the idea of heroic responsibility undermines these existing mechanisms for preventing these problems, partially because I’m skeptical these existing mechanisms make much of a difference in the relevant case.
- cubefox 28 Apr 2025 19:07 UTC
  5 points
  1
  Parent
  Can this be summarized as “don’t optimize for what you believe is good too hard, as you might be mistaken about what is good”?
  - Viliam 1 May 2025 19:33 UTC
    4 points
    0
    Parent
    Maybe “don’t advertise too hard that one should optimize for what they believe is good, because someone crazy will hear you and get radicalized by the message”. (such as Zizians)
    Many people seem to have an instinct to translate “work hard” as “do some crazy violent action”. Just a few days ago, someone on ACX asked: “if you believe that AI is harmful, why don’t you support a terrorist group to kill the AI researchers?” For a certain mindset, this is the obvious logical response to feeling strongly about something: if you are not murdering people left and right, it means you don’t care enough about your cause.
    I guess there is an evolutionary reason for this: we are running on a corrupted hardware. In our evolutionary past, successfully organizing senseless violence could be an efficient way to get to the top of the tribe, so we are tempted by instinct to propose it as a solution for various problems.
    The question is, how to communicate the message so that it arrives to people who are likely to translate “work hard” as becoming stronger, learning about how stuff works, designing a solution, and testing it; but it somehow does not arrive to people who are likely to translate it as “hurt everyone who disagrees”.
- Mitchell_Porter 30 Apr 2025 8:20 UTC
  4 points
  2
  Parent
  spreading the idea of “heroic responsibility” seems, well, irresponsible
  Is this analogous to saying “capabilities research is dangerous and should not be pursued”, but for the human psyche rather than for AI?
  - Wei Dai 1 May 2025 0:03 UTC
    2 points
    0
    Parent
    Yeah, that seems a reasonable way to look at it. “Heroic responsibility” could be viewed as a kind of “unhobbling via prompt engineering”, perhaps.
- Max H 28 Apr 2025 23:24 UTC
  4 points
  0
  Parent
  I kind of doubt that leaders at big labs would self-identify as being motivated by anything like Eliezer’s notion of heroic responsibility. If any do self-identify that way though, they’re either doing it wrong or misunderstanding. Eliezer has written tons of stuff about the need to respect deontology and also think about all of the actual consequences of your actions, even (especially when) the stakes are high:
  The critical question here is: what happens if the plot successfully places the two of them in an epistemic Cooperation-Defection Dilemma, where rather than the two of them just having different goals, Carissa believes that he is mistaken about what happens...
  
  In this case, Carissa could end up believing that to play ‘Defect’ against him would be to serve even his own goals, better than her Cooperating would serve them. Betraying him might seem like a friendly act, an act of aid.
  (https://glowfic.com/replies/1874768#reply-1874768)
  If he commits to a drastic action he will estimate that actual victory lies at the end of it, and his desperation and sacrifice will not have figured into that estimation process as positive factors. His deontology is not for sale at the price point of failure.
  (https://glowfic.com/replies/1940939#reply-1940939)
  Starting an AI lab in order to join a doomed race to superintelligence, and then engaging in a bunch of mundane squabbles for corporate control, seems like exactly the opposite of the sentiment here:
  For Albus Dumbledore, as for her, the rule in extremis was to decide what was the right thing to do, and do it no matter the cost to yourself. Even if it meant breaking your bounds, or changing your role, or letting go of your picture of yourself. That was the last resort of Gryffindor.
  (https://hpmor.com/chapter/93)
  Also, re this example:
  It also seemingly justifies or obligates Sam Altman to fight back when the OpenAI board tried to fire him, if he believed the board was interfering with his mission.
  
  In general, it seems perfectly fine and normal for a founder-CEO to fight back against a board ouster—no need to bring heroic responsibility into it. Of course, all parties including the CEO and the board should stick to legal / above-board / ethical means of “fighting back”, but if there’s a genuine disagreement between the board and the CEO on how to best serve shareholder interests (or humanity’s interests, for a non-profit), why wouldn’t both sides vigorously defend their own positions and power?
  
  Perhaps the intended reading of your example is that heroic responsibility would obligate or justify underhanded tactics to win control, when the dispute has existential consequences. But I think that’s a misunderstanding of the actual concept. Ordinary self-confidence and agency obligate you to defend your own interests / beliefs / power, and heroic responsibility says that you’re obligated to win without stepping outside the bounds of deontology or slipping into invalid / motivated reasoning.
- Garrett Baker 28 Apr 2025 18:04 UTC
  4 points
  2
  Parent
  This argument seems only convincing if you don’t have those destructive values. One man’s destructive values is another’s low-hanging fruit, and those who see low hanging fruit everywhere won’t give up on the fruit just because others may pick it.
  
  Since bad people won’t heed your warning it doesn’t seem in good people’s interests to heed it either.
  
  An analogy is one can make the same argument wrt rationality itself. Its dual use! Someone with bad values can use rationality to do a lot of harm! Does that mean good people shouldn’t use rationality? No!
  - Wei Dai 28 Apr 2025 18:29 UTC
    4 points
    −2
    Parent
    
    Since bad people won’t heed your warning it doesn’t seem in good people’s interests to heed it either.
    
    I’m not trying to “warn bad people”. I think we have existing (even if imperfect) solutions to the problem of destructive values and biased beliefs, which “heroic responsibility” actively damages, so we should stop spreading that idea or even argue against it. See my reply to Ryan, which is also relevant here.
- testingthewaters 28 Apr 2025 22:50 UTC
  2 points
  0
  Parent
  Ah yes, but if all these wannabe heroes keep going we’ll be really screwed, so it’s up to me to take a stand against the fools dooming us all… the ratchet of Moloch cranks ever clockwise