Neel Nanda comments on Mikhail Samin’s Shortform

Neel Nanda 14 Jun 2025 21:49 UTC
6 points
1
The point I was trying to make is that, if I understood you correctly, you were trying to appeal to common sense morality that deontological rules like this are good on consequentialist grounds. I was trying to give examples why I don’t think this immediately follows and you need to actually make object level arguments about this and engage with the counter arguments. If you want to argue for deontological rules, you need to justify why those rules

I am not trying to defend the claim that I am highly confident that what Anthropic is doing is ethical and net good for the world, but I am trying to defend the claim that there are vaguely similar plans to Anthropics that I would predict are net good in expectation, e.g., becoming a prominent actor then leveraging your influence to push for good norms and good regulations. Your arguments would also imply that plans like that should be deontologically prohibited and I disagree.

I don’t think this follows from naive moral intuition. A crucial disanalogy with murder is that if you don’t kill someone, the counterfactual is that the person is alive. While if you don’t race towards AGI, the counterfactual is that maybe someone else makes it and we die anyway. This means that we need to be engaging in discussion about the consequences of there being another actor pushing for this, the consequences of other actions this actor may take, and how this all nets out, which I don’t feel like you’re doing.

I expect AGI to be either the best or worse thing that has ever happened, and this means that important actions will typically be high variance, with major positive or negative consequences. Declining to engage in things with the potential for high negative consequences severely restricts your action space. And given that it’s plausible that there’s a terrible outcome even if we do nothing, I don’t think the act-omission distinction applies
- Ben Pace 14 Jun 2025 22:30 UTC
  6 points
  4
  Parent
  I am not trying to defend the claim that I am highly confident that what Anthropic is doing is ethical and net good for the world, but I am trying to defend the claim that there are vaguely similar plans to Anthropics that I would predict are net good in expectation, e.g., becoming a prominent actor then leveraging your influence to push for good norms and good regulations. Your arguments would also imply that plans like that should be deontologically prohibited and I disagree.
  Thank you for clarifying, I think I understand now. I’m hearing you’re not arguing in defense of Anthropic’s specific plan but in defense of there being some part of the space of plans being good that involve racing to build something that has a (say) >20% chance of causing an extinction-level event, that Anthropic may or may not fall into.
  A crucial disanalogy with murder is that if you don’t kill someone, the counterfactual is that the person is alive. While if you don’t race towards AGI, the counterfactual is that maybe someone else makes it and we die anyway.
  This isn’t disanalagous. As I have already said in this thread, you are not allowed to murder someone even if someone else is planning to murder them. If you find out multiple parties are going to murder Bob, you are not now allowed to murder Bob in a way that is slightly less likely to be successful.
  Crucially it is not to be assumed that we will build AGI in the next 1-2 decades. If the countries of the world decided to ban training runs of a particular size, because we don’t want to take this sort of extinction-level risk, then it would not happen. Assuming this out of the hypothesis space will get you into bad ethical territory. Suppose a military general says “War is inevitable, the only question is how fast it’s over when it starts and how few deaths there are.” This general would never take responsibility for instigating. Similarly if you assume with certainty that AGI will be developed risking in next few decades, you absolve yourself of all responsibility for being the one who does so.
  Declining to engage in things with the potential for high negative consequences severely restricts your action space.
  I think you are failing to understand the concept of deontology by replacing “breaks deontological rules” with “highly negative consequences”. Deontology doesn’t say “you can tell a lie if it saves you from telling two lies later” or “lying is wrong unless you get a lot of money for it”. It says “don’t tell lies”. There are exceptional circumstances for all rules, but unless you’re in an exceptional circumstance, you treat them as rules, and don’t treat violations as integers to be traded against each other.
  
  When the stakes get high it is not time to start lying, cheating, killing, or unilaterally betting the extinction of the human race. If it is for someone, then they simply can’t be trusted to follow these ethical principles when it matters.
  - Neel Nanda 15 Jun 2025 0:50 UTC
    9 points
    2
    Parent
    
    Thank you for clarifying, I think I understand now. I’m hearing you’re not arguing in defense of Anthropic’s specific plan but in defense of there being some part of the space of plans being good that involve racing to build something that has a (say) >20% chance of causing an extinction-level event, that Anthropic may or may not fall into.
    
    Yes that is correct
    
    This isn’t disanalagous. As I have already said in this thread, you are not allowed to murder someone even if someone else is planning to murder them. If you find out multiple parties are going to murder Bob, you are not now allowed to murder Bob in a way that is slightly less likely to be successful.
    
    I disagree. If a patient has a deadly illness then I think it is fine for a surgeon to perform a dangerous operation to try to save their life. I think the word murder is obfuscating things and suggest we instead talk in terms of “taking actions that may lead to death”, which I think is more analogous—hopefully we can agree Anthropic won’t intentionally cause human extinction. I think it is totally reasonable to take actions that net decrease someone’s probability of dying, while introducing some novel risks.
    
    I think you are failing to understand the concept of deontology by replacing “breaks deontological rules” with “highly negative consequences”. Deontology doesn’t say “you can tell a lie if it saves you from telling two lies later” or “lying is wrong unless you get a lot of money for it”. It says “don’t tell lies”. There are exceptional circumstances for all rules, but unless you’re in an exceptional circumstance, you treat them as rules, and don’t treat violations as integers to be traded against each other.
    
    I think we’re talking past each other. I understood you as arguing “deontological rules against X will systematically lead to better consequences than trying to evaluate each situation carefully, because humans are fallible”. I am trying to argue that your proposed deontological rule does not obviously lead to better consequences as an absolute rule. Please correct me if I have misunderstood.
    
    I am arguing that “things to do with human extinction from AI, when there’s already a meaningful likelihood” are not a domain where ethical prohibitions like “never do things that could lead to human extinction” are productive. For example, you help run LessWrong, which I’d argue has helped raise the salience of AI x-risk, which plausibly has accelerated timelines. I personally think this is outweighed by other effects, but that’s via reasoning about the consequences. Your actions and Anthropic’s feel more like a difference in scale than a difference in kind.
    
    Assuming this out of the hypothesis space will get you into bad ethical territory
    
    I am not arguing that AI x-risk is inevitable, in fact I’m arguing the opposite. AI x-risk is both plausible and not inevitable. Actions to reduce this seem very valuable. Actions that do this will often have side effects that increase risk in other ways. In my opinion, this is not sufficient cause to immediately rule them out.
    
    Meanwhile, I would consider anyone pushing hard to make frontier AI to be highly reckless if they were the only one who could cause extinction, and they could unilaterally stop—this is a way to unilaterally bring risk to zero, which is better than any other action. But Anthropic has no such action available, and so I want them to take the actions that reduce risk as much as possible. And there are arguments for proceeding and arguments for stopping.
    - Ben Pace 15 Jun 2025 19:17 UTC
      8 points
      6
      Parent
      > As I have already said in this thread, you are not allowed to murder someone even if someone else is planning to murder them. If you find out multiple parties are going to murder Bob, you are not now allowed to murder Bob in a way that is slightly less likely to be successful.
      I disagree. If a patient has a deadly illness then I think it is fine for a surgeon to perform a dangerous operation to try to save their life. I think the word murder is obfuscating things and suggest we instead talk in terms of “taking actions that may lead to death”, which I think is more analogous—hopefully we can agree Anthropic won’t intentionally cause human extinction. I think it is totally reasonable to take actions that net decrease someone’s probability of dying, while introducing some novel risks.
      This is simplifying away key details.
      If you go up to a person with a deadly illness and non-consensually do a dangerous surgery on them, this is wrong. If you kill them via this, their family has a right to sue you / prosecute you for murder. Once again, simply because some bad outcome is likely, you do not have ethical mandate to now go and cause it yourself. Deontology is typically about forbidding classes of action that on net make the world worse even when locally you have a good reason. Talking about “taking actions that lead to death” explicitly obfuscates the mechanism. I know you won’t endorse this once I point it out, but under this strictly-consequentialist framework “blogging on LessWrong about extinction-risk from AI” and “committing murder” are just two different “actions that lead to death” and neither can be thought of as having different deontological lines drawn. On the contrary, “don’t commit murder” and “don’t build a doomsday machine” are simple and natural deontological rules, whereas “don’t build a blogging platform with unusually high standards for truthseeking” is not.
      I am trying to argue that your proposed deontological rule does not obviously lead to better consequences as an absolute rule. Please correct me if I have misunderstood.
      I am not trying to argue for an especially novel deontological rule… “building a doomsday machine” is wrong. It’s a far greater sin than murder. I think you’d do better to think of the AI companies as more like competing political factions each of whom’s base is very motivated toward committing a genocide against their neighbors. If your political faction commits a genocide; and you were merely a top-200 ranked official who didn’t particularly want a genocide, you still bear moral responsibility for it even though you only did paperwork and took meetings and maybe worked in a different department. Just because there are two political factions whose bases are uncomfortably attracted to the idea of committing genocide does not now make it ethically clear for you to make a third one that hungers for genocide but has wiser people in charge.
      I am not advocating for some new interesting deontological rule. I am arguing that the obvious rule against building a doomsday machine applies here straightforwardly. Deontological violations don’t stop being bad just because other people are committing them. You cannot commit murder just because other people do, and you cannot build a doomsday machine just because other people are. You generally shouldn’t build doomsday machines even if you have a good reason. To argue against this you should show why deontological rules break down, and then apply it to this case, but the doctor example you gave doesn’t show that, because by-default you aren’t actually allowed to non-consensually do risky surgeries on people even if it makes sense on the consequentialist calculus.
      - Neel Nanda 15 Jun 2025 19:58 UTC
        14 points
        3
        Parent
        I continue to feel like we’re talking past each other, so let me start again. We both agree that causing human extinction is extremely bad. If I understand you correctly, you are arguing that it makes sense to follow deontological rules, even if there’s a really good reason breaking them seems locally beneficial, because on average, the decision theory that’s willing to do harmful things for complex reasons performs badly.
        
        The goal of my various analogies was to point out that this is not actually a fully correcct statement about common sense morality. Common sense morality has several exceptions for things like having someone’s consent to take on a risk, someone doing bad things to you, and innocent people being forced to do terrible things.
        
        Given that exceptions exist, for times when we believe the general policy is bad, I am arguing that there should be an additional exception stating that: if there is a realistic chance that a bad outcome happens anyway, and you believe you can reduce the probability of this bad outcome happening (even after accounting for cognitive biases, sources of overconfidence, etc.), it can be ethically permissible to take actions that have side effects around increasing the probability of the bad outcome in other ways.
        
        When analysing the reasons I broadly buy the deontological framework for “don’t commit murder”, I think there are some clear lines in the sand, such as maintaining a valuable social contract, and how if you do nothing, the outcomes will be broadly good. Further, society has never really had to deal with something as extreme as doomsday machines, which makes me hesitant to appeal to common sense morality at all. To me, the point where things break down with standard deontological reasoning is that this is just very outside the context where such priors were developed and have proven to be robust. I am not comfortable naively assuming they will generalize, and I think this is an incredibly high stakes thing where far and away the only thing I care about is taking the actions that will actually, in practice, lead to a lower probability of extinction.
        
        Regarding your examples, I’m completely ethically comfortable with someone making a third political party in a country where the population has two groups who both strongly want to cause genocide to the other. I think there are many ways that such a third political party could reduce the probability of genocide, even if it ultimately comprises a political base who wants negative outcomes.
        
        Another example is nuclear weapons. From a certain perspective, holding nuclear weapons is highly unethical as it risks nuclear winter, whether from provoking someone else or from a false alarm on your side. While I’m strongly in favour of countries unilaterally switching to a no-first-use policy and pursuing mutual disarmament, I am not in favour of countries unilaterally disarming themselves. By my interpretation of your proposed ethical rules, this suggests countries should unilaterally disarm. Do you agree with that? If not, what’s disanalogous?
        
        COVID-19 would be another example. Biology is not my area of expertise, but as I understand it, governments took actions that were probably good but risked some negative effects that could have made things worse. For example, widespread use of vaccines or antivirals, especially via the first-doses-first approach, plausibly made it more likely that resistant strains would spread, potentially affecting everyone else. In my opinion, these were clearly net-positive actions because the good done far outweighed the potential harm.
        
        You could raise the objection that governments are democratically elected while Anthropic is not, but there were many other actors in these scenarios, like uranium miners, vaccine manufacturers, etc., who were also complicit.
        
        Again, I’m purely defending the abstract point of “plans that could result in increased human extinction, even if by building the doomsday machine yourself, are not automatically ethically forbidden”. You’re welcome to critique Anthropic’s actual actions as much as you like. But you seem to be making a much more general claim.
        Ben Pace 16 Jun 2025 0:23 UTC
        21 points
        24
        Parent
        If I understand you correctly, you are arguing that it makes sense to follow deontological rules, even if there’s a really good reason breaking them seems locally beneficial, because on average, the decision theory that’s willing to do harmful things for complex reasons performs badly.
        Hm… I would say that one should follow deontological rules like “don’t lie” and “don’t steal” and so on because we fail to understand or predict the knock-on consequences. For instance they can get the world into a much worse equilibrium of mutual liars/stealers, for instance, in ways that are hard to predict. And being a good person can get the world into a much better equilibrium of mutually-honorable people in ways that are hard to predict. And also because, if it does screw up in some hard to predict way, then when you look back, it will often be the easiest line in the sand to draw.
        
        For instance, if SBF is wondering at what point he could have most reliably intervened on his whole company collapsing and ruining the reputation of things associated with it, he might talk about certain deals he made or strategic plays with Binance or the US Govt, for he is not a very ethical person; I would talk about not taking customer deposits.
        If and when we get to an endgame where tons of AI systems are sociopathically lying and stealing money and ultimately killing the humans, I suspect people of SBF’s mindset again to talk about how the US and China should’ve played things, or how Musk should’ve played OpenAI, and how Amodei should’ve done played with DC. And I will talk about not racing to develop the unaligned AI systems in the first place.
        To me, the point where things break down with standard deontological reasoning is that this is just very outside the context where such priors were developed and have proven to be robust. I am not comfortable naively assuming they will generalize, and I think this is an incredibly high stakes thing where far and away the only thing I care about is taking the actions that will actually, in practice, lead to a lower probability of extinction.
        I don’t really know why you think that this generalization can’t be made to things we’ve not seen before. So many things I experience haven’t been seen before in history. How many centuries have we had to develop ethical intuitions for how to write on web forums? There are still answers to these questions, and I can identify ethical and unethical behaviors, as can you (e.g. sockpuppeting, doxing, brigading, etc). There can be ethical lines in novel situations, not only historically common ones.
        Another example is nuclear weapons. From a certain perspective, holding nuclear weapons is highly unethical as it risks nuclear winter, whether from provoking someone else or from a false alarm on your side. While I’m strongly in favour of countries unilaterally switching to a no-first-use policy and pursuing mutual disarmament, I am not in favour of countries unilaterally disarming themselves. By my interpretation of your proposed ethical rules, this suggests countries should unilaterally disarm. Do you agree with that? If not, what’s disanalogous?
        I am not sure what I would propose if I believed Nuclear Winter was a serious existential threat; it seems plausible to me that the ethical thing would be to unilaterally disarm. I suspect that at the very least if I were a country I would openly and aggressively campaign for mutual disarmament. (If any AI capabilities company openly campaigned for making it illegal to develop AI then I suspect I would consider that plausibly quite ethical).
        I’m purely defending the abstract point of “plans that could result in increased human extinction, even if by building the doomsday machine yourself, are not automatically ethically forbidden”.
        To be clear, I think you’re defending a somewhat stronger claim. You write further up thread:
        I am not trying to defend the claim that I am highly confident that what Anthropic is doing is ethical and net good for the world, but I am trying to defend the claim that there are vaguely similar plans to Anthropics that I would predict are net good in expectation, e.g., becoming a prominent actor then leveraging your influence to push for good norms and good regulations. Your arguments would also imply that plans like that should be deontologically prohibited and I disagree.
        My current stance is that all actors currently in this space are doing things prohibited by basic deontology. This is not merely an unfortunate outcome, but is a grave sin, for they are building doomsday machines, likely the greatest evil that we will ever experience in our history (regardless of if they are successful). So I want to emphasize that the boundary here is not between “better and worse plans” but between “moral murky and morally evil plans”. Insofar as you commit a genocide or worse, history should remember your names as people of shame who we must take pain never to repeat. Insofar as you played with the idea, thought you could control it, and failed, then history should still think of you this way.
        I believe we disagree over where the deontological lines are, given you are defending “vaguely similar plans to Anthropic’s”. Perhaps you could point to where you think they are? Presumably you think that a Larry Page style “this is just the next stage in evolution” indifference to human extinction AI-project would be morally wrong?
        Here’s two lines that I think might cross into being acceptable [edit: or rather, “only morally murky”] from my perspective.
        I think it might be appropriate to risk building a doomsday machine if, loudly and in-public, you told everyone “I AM BUILDING A POTENTIAL DOOMSDAY MACHINE, AND YOU SHOULD SHUT MY INDUSTRY DOWN. IF YOU DON’T THEN I WILL RIDE THIS WAVE AND ATTEMPT TO IMPROVE IT, BUT YOU REALLY SHOULD NOT LET ANYONE DO WHAT I AM DOING.” And was engaged in serious lobbying and advertising efforts to this effect.
        I think it could possibly be acceptable to build an AI capabilities company if you committed to never releasing or developing any frontier capabilities AND if all employees also committed not to leave and release frontier capabilities elsewhere AND you were attempting to use this to differential improve society’s epistemics and awareness of AI’s extinction-level threat. Though this might still cause too much economic investment into AI as an industry, I’m not sure.
        I of course do not think any current project looks superficially like these.
        Neel Nanda 16 Jun 2025 1:11 UTC
        1 point
        −1
        Parent
        
        Here’s two lines that I think might cross into being acceptable from my perspective.
        
        I think it might be appropriate to risk building a doomsday machine if, loudly and in-public, you told everyone “I AM BUILDING A POTENTIAL DOOMSDAY MACHINE, AND YOU SHOULD SHUT MY INDUSTRY DOWN. IF YOU DON’T THEN I WILL RIDE THIS WAVE AND ATTEMPT TO IMPROVE IT, BUT YOU REALLY SHOULD NOT LET ANYONE DO WHAT I AM DOING.” And was engaged in serious lobbying and advertising efforts to this effect.
        
        I think it could possibly be acceptable to build an AI capabilities company if you committed to never releasing or developing any frontier capabilities AND if all employees also committed not to leave and release frontier capabilities elsewhere AND you were attempting to use this to differential improve society’s epistemics and awareness of AI’s extinction-level threat. Though this might still cause too much economic investment into AI as an industry, I’m not sure.
        
        I of course do not think any current project looks superficially like these.
        
        Okay, after reading this it seems to me that we broadly do agree and are just arguing over price. I’m arguing that it is permissible to try to build a doomsday machine if there are really good reasons to believe it is net good for the probability of doomsday. It sounds like you agree, and give two examples of what “really good reasons” could be. I’m sure we disagree on the boundaries of where the really good reasons lie, but I’m trying to defend the point that you actually need to think about the consequences.
        
        What am I missing? Is it that you think these two are really good reasons, not because of the impact on the consequences, but because of the attitude/framing involved?
        ryan_greenblatt 16 Jun 2025 3:57 UTC
        17 points
        18
        Parent
        I’m not Ben, but I think you don’t understand. I think explaining what you are doing loudly in public isn’t like “having a really good reason to believe it is net good” is instead more like asking for consent.
        
        Like you are saying “please stop me by shutting down this industry” and if you don’t get shut down, that it is analogous to consent: you’ve informed society about what you’re doing and why and tried to ensure that if everyone else followed a similar sort of policy we’d be in a better position.
        
        (Not claiming I agree with Ben’s perspective here, just trying to explain it as I understand it.)
        Neel Nanda 16 Jun 2025 11:40 UTC
        4 points
        0
        Parent
        Ah! Thanks a lot for the explanation, that makes way more sense, and is much weaker than what I thought Ben was arguing for. Yeah this seems like a pretty reasonable position, especially “take actions where if everyone else took them we would be much better off” and I am completely fine with holding Anthropic to that bar. I’m not fully sold re the asking for consent framing, but mostly for practical reasons—I think there’s many ways that society is not able to act constantly, and the actions of governments on many issues are not a reflection of the true informed will of the people, but I expect there’s some reframe here that I would agree with.
        habryka 16 Jun 2025 16:45 UTC
        2 points
        0
        Parent
        and is much weaker than what I thought Ben was arguing for.
        I don’t think Ryan (or I) was intending to imply a measure of degree, so my guess is unfortunately somehow communication still failed. Like, I don’t think Ryan (or Ben) are saying “it’s OK to do these things you just have to ask for consent”. Ryan was just trying to point out a specific way in which things don’t bottom out in consequentialist analysis.
        If you end up walking away with thinking that Ben believes “the key thing to get right for AI companies is to ask for consent before building the doomsday machine”, which I feel like is the only interpretation of what you could mean by “weaker” that I currently have, then I think that would be a pretty deep misunderstanding.
        Neel Nanda 16 Jun 2025 18:50 UTC
        4 points
        0
        Parent
        OK, I’m going to bow out of the conversation at this point, I’d guess further back and forth won’t be too productive. Thanks all!
        Expand this thread
        Ben Pace 17 Jun 2025 6:49 UTC
        4 points
        2
        Parent
        There is something important to me in this conversation about not trusting one’s consequentialist analysis when evaluating proposals to violate deontological lines, and from my perspective you still haven’t managed to paraphrase this basic ethical idea or shown you’ve understood it, which I feel a little frustrated over. Ah well. I still have been glad of this opportunity to argue it through, and I feel grateful to Neel for that.