Ben Pace comments on Mikhail Samin’s Shortform

Ben Pace 11 Jun 2025 2:41 UTC
4 points
6
I’m not saying that it’s implausible that the consequences might seem better. I’m stating it’s still morally wrong to race toward causing a likely extinction-level event as that’s a pretty schelling place for a deontological lines against action.
- Neel Nanda 11 Jun 2025 8:18 UTC
  11 points
  1
  Parent
  Ah. In that case we just disagree about morality. I am strongly in favour of judging actions by their consequences, especially for incredibly high stakes actions like potential extinction level events. If an action decreases the probability of extinction I am very strongly in favour of people taking it.
  
  I’m very open to arguments that the consequences would be worse, that this is the wrong decision theory, etc, but you don’t seem to be making those?
  - Ben Pace 11 Jun 2025 23:13 UTC
    24 points
    23
    Parent
    I too believe we should ultimately judge things based on their consequences. I believe that having deontological lines against certain actions is something that leads humans to make decisions with better consequences, partly because we are bounded agents that cannot well-compute the consequences of all of our actions.
    For instance, I think you would agree that it would be wrong to kill someone in order to prevent more deaths, today here in the Western world. Like, if an assassin is going to kill two people, but says if you kill one then he won’t kill the other, if you kill that person you should still be prosecuted for murder. It is actually good to not cross these lines even if the local consequentialist argument seems to check out. I make the same sort of argument for being first in the race toward an extinction-level event. Building an extinction-machine is wrong, and arguing you’ll be slightly more likely to pull back first does not stop it from being something you should not do.
    I think when you look back at a civilization that raced to the precipice and committed auto-genocide, and ask where the lines in the sand should’ve been drawn, the most natural one will be “building the extinction machine, and competing to be first to do so”. So it is wrong to cross this line, even for locally net positive tradeoffs.
    What links here?
    Ben Pace's comment on Mikhail Samin’s Shortform by Mikhail Samin (12 Jun 2025 0:57 UTC; 2 points)
    - Neel Nanda 14 Jun 2025 18:18 UTC
      15 points
      3
      Parent
      I think this just takes it up one level of meta. We are arguing about the consequences of a ruleset. You are arguing that your ruleset has better consequences, while others disagree. And so you try to censure these people—this is your prerogative, but I don’t think this really gets you out of the regress of people disagreeing about what the best actions are.
      
      Engaging with the object level of whether your proposed ruleset is a good one, I feel torn.
      
      For your analogy of murder, I am very pro-not-murdering people, but I would argue this is convergent because it is broadly agreed upon by society. We all benefit from it being part of the social contract, and breaking that erodes the social contract in a way that harms all involved. If Anthropic unilaterally stopped trying to build AGI, I do not think this would significantly affect other labs, who would continue their work, so this feels disanalogous.
      
      And it is reasonable in extreme conditions (e.g. when those prohibitions are violated by others acting against you) to abandon standard ethical prohibitions. For example, I think it was just for Allied soldiers to kill Nazi soldiers in World War II. I think having nuclear weapons is terrible and questionable but I generally don’t support countries unilaterally abandoning their nuclear weapons, leaving them vulnerable to other nuclear-armed nations. Obviously, there are many disanalogies, but my point is that you need to establish how much a given deontological prohibition makes sense in unusual situations, rather than just appealing to moral intuition.
      
      I’m not here to defend Anthropic’s actions on the object level—they are not acting as I would in their situation, but they may have sound reasons. But they are not acting badly enough that I confidently assume bad faith. They have had positive effects, like their technical research and helping RSPs become established, though I disagree with some of their policy positions.
      
      Another disanalogy between this and murder is that there are multiple AGI labs, and only one needs to cause human extinction. If Anthropic ceased to exist, other labs would continue this work. I’d argue that Anthropic is accelerating development by researching capabilities and intensifying commercial pressure, and this is bad. But when arguing about acceleration’s harm, we must weigh it against Anthropic’s potential good—this becomes more of an apples-to-apples comparison rather than a clear deontological violation.
      - Ben Pace 14 Jun 2025 19:45 UTC
        10 points
        7
        Parent
        If Anthropic unilaterally stopped trying to build AGI, I do not think this would significantly affect other labs, who would continue their work, so this feels disanalogous.
        Not a crux for either of us, but I disagree. When is the last time that someone shut down a multi-billion dollar profit arm of a company due to ethics, and especially due to the threat of extinction? If Anthropic announced they were ceasing development / shutting down because they did not want to cause an extinction-level event, this would have massive ramifications through society as people started to take this consequence more seriously, and many people would become more scared, including friends of employees at the other companies and more of the employees themselves. This would have massive positive effects.
        For your analogy of murder, I am very pro-not-murdering people, but I would argue this is convergent because it is broadly agreed upon by society. We all benefit from it being part of the social contract, and breaking that erodes the social contract in a way that harms all involved.
        This implies one should never draw lines in the sand about good/bad behavior if society has not reached consensus on it. In contrast, I think it is good to not do many behaviors even if your society has not yet reached consensus on it. For instance, if a government has not yet regulated that language-models shouldn’t encourage people to kill themselves, and then language models do and 1000s of ppl die (NB: this is a fictional example), this isn’t ethically fine just because it wasn’t illegal. I think we should act in ways that we believe will make sense as policies even before they have achieved consensus, and this is part of what makes someone engaged in ethics rather than in simply “doing what you are told”.
        You bring up Nazism. I think that it was wrong to go along with Nazism even though the government endorsed it. Surely there are ethical lines against causing an extinction-level event even if your society has not come to a consensus on where those lines are yet. And even if we never achieve consensus, everyone should still attempt to figure out the answer and live by it, rather than give up on having such ethical lines.
        I’m not here to defend Anthropic’s actions on the object level—they are not acting as I would in their situation, but they may have sound reasons. But they are not acting badly enough that I confidently assume bad faith. They have had positive effects, like their technical research and helping RSPs become established, though I disagree with some of their policy positions.
        Habryka wrote about how the bad-faith comment was a non-sequiter in another thread. I will here say that the “I’m not here to defend Anthropic’s actions on the object level” doesn’t make sense to me. I am saying they should stop racing, and you are saying they should not, and we are exchanging arguments for this, currently coming down to the ethics of racing toward an extinction-level event and whether there are deontological lines against doing that. I agree that you are not attempting to endorse all the details of what they are doing beyond that, but I believe you are broadly defending their IMO key object-level action of doing multi-billion dollar AI capabilities research and building massive industry momentum.
        You are arguing that your ruleset has better consequences, while others disagree. And so you try to censure these people—this is your prerogative, but I don’t think this really gets you out of the regress of people disagreeing about what the best actions are.
        It reads to me that you’re just talking around the point here. I said that people shouldn’t race toward extinction-level threats for deontological reasons, you said we should evaluate the direct consequences, I said deontological reasons are endorsed by a consequentialist framework so we should analyze it deontologically, and now you’re saying that I’m conceding the initial point that we should be doing the consequentialist analysis. No, I’m saying we should do a deontological analysis, and this is in conflict with you saying we should just judge based on the direct consequences that we know how to estimate.
        I’d argue that Anthropic is accelerating development by researching capabilities and intensifying commercial pressure, and this is bad. But when arguing about acceleration’s harm, we must weigh it against Anthropic’s potential good—this becomes more of an apples-to-apples comparison rather than a clear deontological violation.
        You keep trying to engage me in this consequentialist analysis, and say that sometimes (e.g. during times of war) the deontological rules can have exceptions, but you have not argued for why this is an exception. If people around you in society start committing murder, would you then start murdering? If people around you started lying, would you then start lying? I don’t think so. Why then, if people around you are racing to an extinction-level event, does the obvious rule of “do not race toward an extinction-level event” get an exception? Other people doing things that are wrong (even if they get away with it!) doesn’t make those things right.
        Neel Nanda 14 Jun 2025 21:49 UTC
        6 points
        1
        Parent
        The point I was trying to make is that, if I understood you correctly, you were trying to appeal to common sense morality that deontological rules like this are good on consequentialist grounds. I was trying to give examples why I don’t think this immediately follows and you need to actually make object level arguments about this and engage with the counter arguments. If you want to argue for deontological rules, you need to justify why those rules
        
        I am not trying to defend the claim that I am highly confident that what Anthropic is doing is ethical and net good for the world, but I am trying to defend the claim that there are vaguely similar plans to Anthropics that I would predict are net good in expectation, e.g., becoming a prominent actor then leveraging your influence to push for good norms and good regulations. Your arguments would also imply that plans like that should be deontologically prohibited and I disagree.
        
        I don’t think this follows from naive moral intuition. A crucial disanalogy with murder is that if you don’t kill someone, the counterfactual is that the person is alive. While if you don’t race towards AGI, the counterfactual is that maybe someone else makes it and we die anyway. This means that we need to be engaging in discussion about the consequences of there being another actor pushing for this, the consequences of other actions this actor may take, and how this all nets out, which I don’t feel like you’re doing.
        
        I expect AGI to be either the best or worse thing that has ever happened, and this means that important actions will typically be high variance, with major positive or negative consequences. Declining to engage in things with the potential for high negative consequences severely restricts your action space. And given that it’s plausible that there’s a terrible outcome even if we do nothing, I don’t think the act-omission distinction applies
        Ben Pace 14 Jun 2025 22:30 UTC
        6 points
        4
        Parent
        I am not trying to defend the claim that I am highly confident that what Anthropic is doing is ethical and net good for the world, but I am trying to defend the claim that there are vaguely similar plans to Anthropics that I would predict are net good in expectation, e.g., becoming a prominent actor then leveraging your influence to push for good norms and good regulations. Your arguments would also imply that plans like that should be deontologically prohibited and I disagree.
        Thank you for clarifying, I think I understand now. I’m hearing you’re not arguing in defense of Anthropic’s specific plan but in defense of there being some part of the space of plans being good that involve racing to build something that has a (say) >20% chance of causing an extinction-level event, that Anthropic may or may not fall into.
        A crucial disanalogy with murder is that if you don’t kill someone, the counterfactual is that the person is alive. While if you don’t race towards AGI, the counterfactual is that maybe someone else makes it and we die anyway.
        This isn’t disanalagous. As I have already said in this thread, you are not allowed to murder someone even if someone else is planning to murder them. If you find out multiple parties are going to murder Bob, you are not now allowed to murder Bob in a way that is slightly less likely to be successful.
        Crucially it is not to be assumed that we will build AGI in the next 1-2 decades. If the countries of the world decided to ban training runs of a particular size, because we don’t want to take this sort of extinction-level risk, then it would not happen. Assuming this out of the hypothesis space will get you into bad ethical territory. Suppose a military general says “War is inevitable, the only question is how fast it’s over when it starts and how few deaths there are.” This general would never take responsibility for instigating. Similarly if you assume with certainty that AGI will be developed risking in next few decades, you absolve yourself of all responsibility for being the one who does so.
        Declining to engage in things with the potential for high negative consequences severely restricts your action space.
        I think you are failing to understand the concept of deontology by replacing “breaks deontological rules” with “highly negative consequences”. Deontology doesn’t say “you can tell a lie if it saves you from telling two lies later” or “lying is wrong unless you get a lot of money for it”. It says “don’t tell lies”. There are exceptional circumstances for all rules, but unless you’re in an exceptional circumstance, you treat them as rules, and don’t treat violations as integers to be traded against each other.
        
        When the stakes get high it is not time to start lying, cheating, killing, or unilaterally betting the extinction of the human race. If it is for someone, then they simply can’t be trusted to follow these ethical principles when it matters.
        Neel Nanda 15 Jun 2025 0:50 UTC
        9 points
        2
        Parent
        
        Thank you for clarifying, I think I understand now. I’m hearing you’re not arguing in defense of Anthropic’s specific plan but in defense of there being some part of the space of plans being good that involve racing to build something that has a (say) >20% chance of causing an extinction-level event, that Anthropic may or may not fall into.
        
        Yes that is correct
        
        This isn’t disanalagous. As I have already said in this thread, you are not allowed to murder someone even if someone else is planning to murder them. If you find out multiple parties are going to murder Bob, you are not now allowed to murder Bob in a way that is slightly less likely to be successful.
        
        I disagree. If a patient has a deadly illness then I think it is fine for a surgeon to perform a dangerous operation to try to save their life. I think the word murder is obfuscating things and suggest we instead talk in terms of “taking actions that may lead to death”, which I think is more analogous—hopefully we can agree Anthropic won’t intentionally cause human extinction. I think it is totally reasonable to take actions that net decrease someone’s probability of dying, while introducing some novel risks.
        
        I think you are failing to understand the concept of deontology by replacing “breaks deontological rules” with “highly negative consequences”. Deontology doesn’t say “you can tell a lie if it saves you from telling two lies later” or “lying is wrong unless you get a lot of money for it”. It says “don’t tell lies”. There are exceptional circumstances for all rules, but unless you’re in an exceptional circumstance, you treat them as rules, and don’t treat violations as integers to be traded against each other.
        
        I think we’re talking past each other. I understood you as arguing “deontological rules against X will systematically lead to better consequences than trying to evaluate each situation carefully, because humans are fallible”. I am trying to argue that your proposed deontological rule does not obviously lead to better consequences as an absolute rule. Please correct me if I have misunderstood.
        
        I am arguing that “things to do with human extinction from AI, when there’s already a meaningful likelihood” are not a domain where ethical prohibitions like “never do things that could lead to human extinction” are productive. For example, you help run LessWrong, which I’d argue has helped raise the salience of AI x-risk, which plausibly has accelerated timelines. I personally think this is outweighed by other effects, but that’s via reasoning about the consequences. Your actions and Anthropic’s feel more like a difference in scale than a difference in kind.
        
        Assuming this out of the hypothesis space will get you into bad ethical territory
        
        I am not arguing that AI x-risk is inevitable, in fact I’m arguing the opposite. AI x-risk is both plausible and not inevitable. Actions to reduce this seem very valuable. Actions that do this will often have side effects that increase risk in other ways. In my opinion, this is not sufficient cause to immediately rule them out.
        
        Meanwhile, I would consider anyone pushing hard to make frontier AI to be highly reckless if they were the only one who could cause extinction, and they could unilaterally stop—this is a way to unilaterally bring risk to zero, which is better than any other action. But Anthropic has no such action available, and so I want them to take the actions that reduce risk as much as possible. And there are arguments for proceeding and arguments for stopping.
        Ben Pace 15 Jun 2025 19:17 UTC
        8 points
        6
        Parent
        > As I have already said in this thread, you are not allowed to murder someone even if someone else is planning to murder them. If you find out multiple parties are going to murder Bob, you are not now allowed to murder Bob in a way that is slightly less likely to be successful.
        I disagree. If a patient has a deadly illness then I think it is fine for a surgeon to perform a dangerous operation to try to save their life. I think the word murder is obfuscating things and suggest we instead talk in terms of “taking actions that may lead to death”, which I think is more analogous—hopefully we can agree Anthropic won’t intentionally cause human extinction. I think it is totally reasonable to take actions that net decrease someone’s probability of dying, while introducing some novel risks.
        This is simplifying away key details.
        If you go up to a person with a deadly illness and non-consensually do a dangerous surgery on them, this is wrong. If you kill them via this, their family has a right to sue you / prosecute you for murder. Once again, simply because some bad outcome is likely, you do not have ethical mandate to now go and cause it yourself. Deontology is typically about forbidding classes of action that on net make the world worse even when locally you have a good reason. Talking about “taking actions that lead to death” explicitly obfuscates the mechanism. I know you won’t endorse this once I point it out, but under this strictly-consequentialist framework “blogging on LessWrong about extinction-risk from AI” and “committing murder” are just two different “actions that lead to death” and neither can be thought of as having different deontological lines drawn. On the contrary, “don’t commit murder” and “don’t build a doomsday machine” are simple and natural deontological rules, whereas “don’t build a blogging platform with unusually high standards for truthseeking” is not.
        I am trying to argue that your proposed deontological rule does not obviously lead to better consequences as an absolute rule. Please correct me if I have misunderstood.
        I am not trying to argue for an especially novel deontological rule… “building a doomsday machine” is wrong. It’s a far greater sin than murder. I think you’d do better to think of the AI companies as more like competing political factions each of whom’s base is very motivated toward committing a genocide against their neighbors. If your political faction commits a genocide; and you were merely a top-200 ranked official who didn’t particularly want a genocide, you still bear moral responsibility for it even though you only did paperwork and took meetings and maybe worked in a different department. Just because there are two political factions whose bases are uncomfortably attracted to the idea of committing genocide does not now make it ethically clear for you to make a third one that hungers for genocide but has wiser people in charge.
        I am not advocating for some new interesting deontological rule. I am arguing that the obvious rule against building a doomsday machine applies here straightforwardly. Deontological violations don’t stop being bad just because other people are committing them. You cannot commit murder just because other people do, and you cannot build a doomsday machine just because other people are. You generally shouldn’t build doomsday machines even if you have a good reason. To argue against this you should show why deontological rules break down, and then apply it to this case, but the doctor example you gave doesn’t show that, because by-default you aren’t actually allowed to non-consensually do risky surgeries on people even if it makes sense on the consequentialist calculus.
        Neel Nanda 15 Jun 2025 19:58 UTC
        14 points
        3
        Parent
        I continue to feel like we’re talking past each other, so let me start again. We both agree that causing human extinction is extremely bad. If I understand you correctly, you are arguing that it makes sense to follow deontological rules, even if there’s a really good reason breaking them seems locally beneficial, because on average, the decision theory that’s willing to do harmful things for complex reasons performs badly.
        
        The goal of my various analogies was to point out that this is not actually a fully correcct statement about common sense morality. Common sense morality has several exceptions for things like having someone’s consent to take on a risk, someone doing bad things to you, and innocent people being forced to do terrible things.
        
        Given that exceptions exist, for times when we believe the general policy is bad, I am arguing that there should be an additional exception stating that: if there is a realistic chance that a bad outcome happens anyway, and you believe you can reduce the probability of this bad outcome happening (even after accounting for cognitive biases, sources of overconfidence, etc.), it can be ethically permissible to take actions that have side effects around increasing the probability of the bad outcome in other ways.
        
        When analysing the reasons I broadly buy the deontological framework for “don’t commit murder”, I think there are some clear lines in the sand, such as maintaining a valuable social contract, and how if you do nothing, the outcomes will be broadly good. Further, society has never really had to deal with something as extreme as doomsday machines, which makes me hesitant to appeal to common sense morality at all. To me, the point where things break down with standard deontological reasoning is that this is just very outside the context where such priors were developed and have proven to be robust. I am not comfortable naively assuming they will generalize, and I think this is an incredibly high stakes thing where far and away the only thing I care about is taking the actions that will actually, in practice, lead to a lower probability of extinction.
        
        Regarding your examples, I’m completely ethically comfortable with someone making a third political party in a country where the population has two groups who both strongly want to cause genocide to the other. I think there are many ways that such a third political party could reduce the probability of genocide, even if it ultimately comprises a political base who wants negative outcomes.
        
        Another example is nuclear weapons. From a certain perspective, holding nuclear weapons is highly unethical as it risks nuclear winter, whether from provoking someone else or from a false alarm on your side. While I’m strongly in favour of countries unilaterally switching to a no-first-use policy and pursuing mutual disarmament, I am not in favour of countries unilaterally disarming themselves. By my interpretation of your proposed ethical rules, this suggests countries should unilaterally disarm. Do you agree with that? If not, what’s disanalogous?
        
        COVID-19 would be another example. Biology is not my area of expertise, but as I understand it, governments took actions that were probably good but risked some negative effects that could have made things worse. For example, widespread use of vaccines or antivirals, especially via the first-doses-first approach, plausibly made it more likely that resistant strains would spread, potentially affecting everyone else. In my opinion, these were clearly net-positive actions because the good done far outweighed the potential harm.
        
        You could raise the objection that governments are democratically elected while Anthropic is not, but there were many other actors in these scenarios, like uranium miners, vaccine manufacturers, etc., who were also complicit.
        
        Again, I’m purely defending the abstract point of “plans that could result in increased human extinction, even if by building the doomsday machine yourself, are not automatically ethically forbidden”. You’re welcome to critique Anthropic’s actual actions as much as you like. But you seem to be making a much more general claim.
        Ben Pace 16 Jun 2025 0:23 UTC
        21 points
        24
        Parent
        If I understand you correctly, you are arguing that it makes sense to follow deontological rules, even if there’s a really good reason breaking them seems locally beneficial, because on average, the decision theory that’s willing to do harmful things for complex reasons performs badly.
        Hm… I would say that one should follow deontological rules like “don’t lie” and “don’t steal” and so on because we fail to understand or predict the knock-on consequences. For instance they can get the world into a much worse equilibrium of mutual liars/stealers, for instance, in ways that are hard to predict. And being a good person can get the world into a much better equilibrium of mutually-honorable people in ways that are hard to predict. And also because, if it does screw up in some hard to predict way, then when you look back, it will often be the easiest line in the sand to draw.
        
        For instance, if SBF is wondering at what point he could have most reliably intervened on his whole company collapsing and ruining the reputation of things associated with it, he might talk about certain deals he made or strategic plays with Binance or the US Govt, for he is not a very ethical person; I would talk about not taking customer deposits.
        If and when we get to an endgame where tons of AI systems are sociopathically lying and stealing money and ultimately killing the humans, I suspect people of SBF’s mindset again to talk about how the US and China should’ve played things, or how Musk should’ve played OpenAI, and how Amodei should’ve done played with DC. And I will talk about not racing to develop the unaligned AI systems in the first place.
        To me, the point where things break down with standard deontological reasoning is that this is just very outside the context where such priors were developed and have proven to be robust. I am not comfortable naively assuming they will generalize, and I think this is an incredibly high stakes thing where far and away the only thing I care about is taking the actions that will actually, in practice, lead to a lower probability of extinction.
        I don’t really know why you think that this generalization can’t be made to things we’ve not seen before. So many things I experience haven’t been seen before in history. How many centuries have we had to develop ethical intuitions for how to write on web forums? There are still answers to these questions, and I can identify ethical and unethical behaviors, as can you (e.g. sockpuppeting, doxing, brigading, etc). There can be ethical lines in novel situations, not only historically common ones.
        Another example is nuclear weapons. From a certain perspective, holding nuclear weapons is highly unethical as it risks nuclear winter, whether from provoking someone else or from a false alarm on your side. While I’m strongly in favour of countries unilaterally switching to a no-first-use policy and pursuing mutual disarmament, I am not in favour of countries unilaterally disarming themselves. By my interpretation of your proposed ethical rules, this suggests countries should unilaterally disarm. Do you agree with that? If not, what’s disanalogous?
        I am not sure what I would propose if I believed Nuclear Winter was a serious existential threat; it seems plausible to me that the ethical thing would be to unilaterally disarm. I suspect that at the very least if I were a country I would openly and aggressively campaign for mutual disarmament. (If any AI capabilities company openly campaigned for making it illegal to develop AI then I suspect I would consider that plausibly quite ethical).
        I’m purely defending the abstract point of “plans that could result in increased human extinction, even if by building the doomsday machine yourself, are not automatically ethically forbidden”.
        To be clear, I think you’re defending a somewhat stronger claim. You write further up thread:
        I am not trying to defend the claim that I am highly confident that what Anthropic is doing is ethical and net good for the world, but I am trying to defend the claim that there are vaguely similar plans to Anthropics that I would predict are net good in expectation, e.g., becoming a prominent actor then leveraging your influence to push for good norms and good regulations. Your arguments would also imply that plans like that should be deontologically prohibited and I disagree.
        My current stance is that all actors currently in this space are doing things prohibited by basic deontology. This is not merely an unfortunate outcome, but is a grave sin, for they are building doomsday machines, likely the greatest evil that we will ever experience in our history (regardless of if they are successful). So I want to emphasize that the boundary here is not between “better and worse plans” but between “moral murky and morally evil plans”. Insofar as you commit a genocide or worse, history should remember your names as people of shame who we must take pain never to repeat. Insofar as you played with the idea, thought you could control it, and failed, then history should still think of you this way.
        I believe we disagree over where the deontological lines are, given you are defending “vaguely similar plans to Anthropic’s”. Perhaps you could point to where you think they are? Presumably you think that a Larry Page style “this is just the next stage in evolution” indifference to human extinction AI-project would be morally wrong?
        Here’s two lines that I think might cross into being acceptable [edit: or rather, “only morally murky”] from my perspective.
        I think it might be appropriate to risk building a doomsday machine if, loudly and in-public, you told everyone “I AM BUILDING A POTENTIAL DOOMSDAY MACHINE, AND YOU SHOULD SHUT MY INDUSTRY DOWN. IF YOU DON’T THEN I WILL RIDE THIS WAVE AND ATTEMPT TO IMPROVE IT, BUT YOU REALLY SHOULD NOT LET ANYONE DO WHAT I AM DOING.” And was engaged in serious lobbying and advertising efforts to this effect.
        I think it could possibly be acceptable to build an AI capabilities company if you committed to never releasing or developing any frontier capabilities AND if all employees also committed not to leave and release frontier capabilities elsewhere AND you were attempting to use this to differential improve society’s epistemics and awareness of AI’s extinction-level threat. Though this might still cause too much economic investment into AI as an industry, I’m not sure.
        I of course do not think any current project looks superficially like these.
        Expand this thread
        Neel Nanda 16 Jun 2025 1:11 UTC
        1 point
        −1
        Parent
        
        Here’s two lines that I think might cross into being acceptable from my perspective.
        
        I think it might be appropriate to risk building a doomsday machine if, loudly and in-public, you told everyone “I AM BUILDING A POTENTIAL DOOMSDAY MACHINE, AND YOU SHOULD SHUT MY INDUSTRY DOWN. IF YOU DON’T THEN I WILL RIDE THIS WAVE AND ATTEMPT TO IMPROVE IT, BUT YOU REALLY SHOULD NOT LET ANYONE DO WHAT I AM DOING.” And was engaged in serious lobbying and advertising efforts to this effect.
        
        I think it could possibly be acceptable to build an AI capabilities company if you committed to never releasing or developing any frontier capabilities AND if all employees also committed not to leave and release frontier capabilities elsewhere AND you were attempting to use this to differential improve society’s epistemics and awareness of AI’s extinction-level threat. Though this might still cause too much economic investment into AI as an industry, I’m not sure.
        
        I of course do not think any current project looks superficially like these.
        
        Okay, after reading this it seems to me that we broadly do agree and are just arguing over price. I’m arguing that it is permissible to try to build a doomsday machine if there are really good reasons to believe it is net good for the probability of doomsday. It sounds like you agree, and give two examples of what “really good reasons” could be. I’m sure we disagree on the boundaries of where the really good reasons lie, but I’m trying to defend the point that you actually need to think about the consequences.
        
        What am I missing? Is it that you think these two are really good reasons, not because of the impact on the consequences, but because of the attitude/framing involved?
        ryan_greenblatt 16 Jun 2025 3:57 UTC
        17 points
        18
        Parent
        I’m not Ben, but I think you don’t understand. I think explaining what you are doing loudly in public isn’t like “having a really good reason to believe it is net good” is instead more like asking for consent.
        
        Like you are saying “please stop me by shutting down this industry” and if you don’t get shut down, that it is analogous to consent: you’ve informed society about what you’re doing and why and tried to ensure that if everyone else followed a similar sort of policy we’d be in a better position.
        
        (Not claiming I agree with Ben’s perspective here, just trying to explain it as I understand it.)
        Neel Nanda 16 Jun 2025 11:40 UTC
        4 points
        0
        Parent
        Ah! Thanks a lot for the explanation, that makes way more sense, and is much weaker than what I thought Ben was arguing for. Yeah this seems like a pretty reasonable position, especially “take actions where if everyone else took them we would be much better off” and I am completely fine with holding Anthropic to that bar. I’m not fully sold re the asking for consent framing, but mostly for practical reasons—I think there’s many ways that society is not able to act constantly, and the actions of governments on many issues are not a reflection of the true informed will of the people, but I expect there’s some reframe here that I would agree with.
        habryka 16 Jun 2025 16:45 UTC
        2 points
        0
        Parent
        and is much weaker than what I thought Ben was arguing for.
        I don’t think Ryan (or I) was intending to imply a measure of degree, so my guess is unfortunately somehow communication still failed. Like, I don’t think Ryan (or Ben) are saying “it’s OK to do these things you just have to ask for consent”. Ryan was just trying to point out a specific way in which things don’t bottom out in consequentialist analysis.
        If you end up walking away with thinking that Ben believes “the key thing to get right for AI companies is to ask for consent before building the doomsday machine”, which I feel like is the only interpretation of what you could mean by “weaker” that I currently have, then I think that would be a pretty deep misunderstanding.
        Neel Nanda 16 Jun 2025 18:50 UTC
        4 points
        0
        Parent
        OK, I’m going to bow out of the conversation at this point, I’d guess further back and forth won’t be too productive. Thanks all!
        Ben Pace 17 Jun 2025 6:49 UTC
        4 points
        2
        Parent
        There is something important to me in this conversation about not trusting one’s consequentialist analysis when evaluating proposals to violate deontological lines, and from my perspective you still haven’t managed to paraphrase this basic ethical idea or shown you’ve understood it, which I feel a little frustrated over. Ah well. I still have been glad of this opportunity to argue it through, and I feel grateful to Neel for that.
      - Mikhail Samin 14 Jun 2025 19:31 UTC
        5 points
        5
        Parent
        I actually agree with Neel that, in principle, an AI lab could race for AGI while acting responsibly and IMO not violating deontology.
        Releasing models exactly at the level of their top competitor, immediately after the competitor’s release and a bit cheaper; talking to the governments and lobbying for regulation; having an actually robust governance structure and not doing a thing that increases the chance of everyone dying.
        This doesn’t describe any of the existing labs, though.
      - habryka 14 Jun 2025 18:55 UTC
        4 points
        0
        Parent
        But they are not acting badly enough that I confidently assume bad faith
        I like a lot of your comment, but this feels like a total non-sequitur. Did anyone involved in this conversation say that Anthropic was acting under false pretenses? I don’t think anyone brought up concerns that rest on assumptions of bad faith (though to be clear, Anthropic employees have mostly told me I should assume something like bad faith from Anthropic as an institution, and that people should try to hold it accountable the same way any other AI lab, and to not straightforwardly trust statements Anthropic makes without associated commitments, so I do think I would assume bad faith, but it mostly just feels besides the point in this discussion).
        Neel Nanda 14 Jun 2025 18:59 UTC
        2 points
        0
        Parent
        Ah, sorry, I was thinking of Mikhail’s reply here, not anything you or Ben said in this conversation https://www.lesswrong.com/posts/BqwXYFtpetFxqkxip/mikhail-samin-s-shortform?commentId=w2doi6TzjB5HMMfmx
        
        But yeah, I’m happy to leave that aside, I don’t think it’s cruxy
        habryka 14 Jun 2025 19:01 UTC
        2 points
        0
        Parent
        Makes sense! I hadn’t read that subthread, so was additionally confused.
      - Mikhail Samin 14 Jun 2025 19:35 UTC
        2 points
        −3
        Parent
        it was just for Allied soldiers to kill Nazi soldiers in World War II
        Killing anyone who hasn’t done anything to lose deontological protection is wrong and clearly violates deontology.
        As a Nazi soldier, you lose deontological protection.
        There are many humans who are not even customers of any of the AI labs; they clearly have not lost deontological protection, and it’s not okay to risk killing them without their consent.
        Neel Nanda 15 Jun 2025 11:39 UTC
        6 points
        5
        Parent
        I disagree with this as a statement about war, I’m sure a bunch of Nazi soldiers were conscripted, did not particularly support the regime, and were participating out of fear. Similarly, malicious governments have conscripted innocent civilians and kept them in line through fear in many unjust wars throughout history. And even people who volunteered may have done this due to being brainwashed by extensive propaganda that led to them believing they were doing the right thing. The real world is messy and strict deontological prohibitions break down in complex and high stakes situations, where inaction also has terrible consequences—I strongly disagree with a deontological rule that says countries are not about to defend themselves against innocent people forced to do terrible things
        Mikhail Samin 15 Jun 2025 18:06 UTC
        2 points
        0
        Parent
        My deontology prescribes not to join a Nazi army regardless of how much fear you’re in. It’s impossible to demand of people to be HPMOR!Hermione, but I think this standard works fine for real-world situations.
        (While I do not wish any Nazi soldiers death, regardless of their views or reasons for their actions. There’s a sense in which Nazi soldiers are innocent regardless of what they’ve done; none of them are grown up enough to be truly responsible for their actions. Every single death is very sad, and I’m not sure there has ever been even a single non-innocent human. At the same time, I think it’s okay to kill Nazi soldiers (unless they’re in a process of surrenderring, etc.) or lie to them, and they don’t have deontological protection.)
        You’re arguing it’s okay to defend yourself against innocent people forced to do terrible things. I agree with that, and my deontology agrees with that.
        At the same time, killing everyone because otherwise someone else could’ve killed them with a higher chance = killing many people who aren’t ever going to contribute to any terrible things. I think, and my deontology thinks, that this is not okay. Random civilians are not innocent Nazi soldiers; they’re simply random innocent people. I ask of Anthropic to please stop working towards killing them.
        Neel Nanda 15 Jun 2025 18:24 UTC
        3 points
        0
        Parent
        And do you feel this way because you believe that the general policy of obeying such deontological prohibitions will on net result in better outcomes? Or because you think that even if there were good reason to believe that following a different policy would lead to better empirical outcomes, your ethics say that you should be deontologically opposed regardless?
        Mikhail Samin 15 Jun 2025 19:12 UTC
        5 points
        2
        Parent
        I think the general policy of obeying such deontological rules leads to better outcomes; this is the reason for having deontology in the first place. (I agree with that old post on what to do when it feels like there’s a good reason to believe that following a different policy would lead to better outcomes.)
        habryka 15 Jun 2025 19:15 UTC
        4 points
        1
        Parent
        (Just as a datapoint, while largely agreeing with Ben here, I really don’t buy this concept of deontological protection of individuals. I think there are principles we have about when it’s OK to kill someone, but I don’t think the lines we have here route through individuals losing deontological protection.
        Killing a mass murderer while he is waiting for trial is IMO worse than killing a civilian in collateral damage as part of taking out an active combatant, because it violates and messes with different processes, which don’t generally route through individuals “losing deontological protection” but instead are more sensitive to the context the individuals are in)
        Mikhail Samin 16 Jun 2025 8:17 UTC
        2 points
        0
        Parent
        Locally: can you give an example of when it’s okay to kill someone who didn’t lose deontological protection, where you want to kill them because of the causal impact of their death?
        Ben Pace 17 Jun 2025 18:00 UTC
        4 points
        2
        Parent
        To me the issue goes the other way. The idea of “losing deontological protection” suggests I’m allowed to ignore deontological rules when interacting with someone. But that seems obviously crazy to me. For instance I think there’s a deontological injunction against lying, but just because someone lies doesn’t now mean I’m allowed to kill them. It doesn’t even mean I’m allowed to lie to them. I think lying to them would still be about as wrong as it was before, not a free action I can take whenever I feel like it.
        habryka 17 Jun 2025 18:29 UTC
        3 points
        2
        Parent
        I mean, a very classical example that I’ve seen a few times in media is shooting a civilian who is about to walk into a minefield in which multiple other civilians or military members are located. It seems tragic but obviously the right choice to shoot them if they don’t heed your warning.
        IDK, I also think it’s the right choice to pull the lever in the trolley problem, though the choice becomes less obvious the more it involves active killing as opposed to literally pulling a lever.
    - Knight Lee 23 Jun 2025 6:53 UTC
      7 points
      −6
      Parent
      Sorry for replying to a dead thread but,
      Murder implies an intent to kill someone.
      Suppose I hire a hitman to kill you. But suppose there already are 3 hitmen trying to kill you, and I’m hoping my hitman would reach you first, and I know that my hitman has really bad aim. Once the first hitman reaches you and starts shooting, the other hitmen will freak out and run away, so I’m hoping you’re more likely to survive.
      I have no other options for saving you, since the only contact I have is a hitman, and he’s very bad at English and doesn’t understand any instructions except trying to kill someone.
      In this case, you can argue to the court that my plan to save you was retarded. But you cannot concede that my plan actually was a good idea consequentially, but deontologically unethical. Since I didn’t intend to kill anyone.
      Deontology only kicks in when your plan involves making someone die, or greatly increasing the chance someone dies.
      - Mikhail Samin 23 Jun 2025 15:34 UTC
        7 points
        1
        Parent
        I feel like this it’s actually a great analogy! The only difference is that if your hitman starts shooting and doesn’t kill anyone, you get infinite gold.
        You know that in real life you go to police instead of hiring a hitman, right?
        And I claim that it’s really not okay to hire a hitman who might lower the chance of the person ending up dead, especially when your brain is aware of the infinite gold part.
        The good strategy for anyone in that situation to follow is to go to the police or go public and not hire any additional hitmen.
        Knight Lee 24 Jun 2025 2:36 UTC
        3 points
        2
        Parent
        Yeah, it’s less deontologically bad than murder but I admit it’s still not completely okay.
        PS: Part of the reason I used the unflattering hitman analogy is because I’m no longer as optimistic about Anthropic’s influence.
        They routinely describe other problems (e.g. winning the race against China to defend democracy) with the same urgency as AI Notkilleveryoneism.
        The only way to believe that AI Notkilleveryoneism is still Anthropic’s main purpose, is to hope that,
        They describe a ton of other problems with the same urgency as AI Notkilleveryoneism, but that is only due to political necessity.
        At the same time, their apparent concern for AI Notkilleveryoneism is not just a political maneuver, but significantly more genuine.
        This “hope” is plausible since the people in charge of Anthropic prefer to live, and consistently claimed to have high P(doom).
        But it’s not certain, and there is circumstantial evidence suggesting this isn’t the case (e.g. their lobbying direction, and how they’re choosing people for their board of directors).
        Maybe $^{50 %}$ this hope is just cope :(
      - Ben Pace 23 Jun 2025 16:50 UTC
        6 points
        0
        Parent
        I don’t agree that deontology is about intent. Deontology is about action. Deontology is about not hiring hitmen to kill someone even if you have a really good reason, and even if your intent is good. Deontology is substantially about schelling lines of action where everything gets hard to predict and goes bad after you commit it.
        I imagine that your incompetent hitman has only like a 50% chance of succeeding, whereas the others have ~100%, that seems deontologically wrong to me.
        It seems plausible that what you mean to say by the hypothetical is that he has 0% chance.
        I admit this is more confusing and I’m not fully resolved on this.
        I notice I am confused about how you can get that epistemic state in real life.
        I observe that society will still prosecute you for attempted murder if you buy a hitman off the dark web, even one with a clearly incompetent reputation for ⁰⁄₁₀ kills or whatever.
        I think society’s ability to police this line is not as fine grained as you’re imagining, and so you should not buy incompetent hitmen in order to not kill your friend, unless you’re willing to face the consequences.
        Knight Lee 24 Jun 2025 1:57 UTC
        1 point
        0
        Parent
        To be honest I couldn’t resist writing the comment because I just wanted to share the silly thought :/
        Now that I think about it, it’s much more complicated. Mikhail Samin is right that the personal incentive of reaching AGI first really complicates the good intentions. And while a lot of deontology is about intent, it’s hyperbole to say that deontology is just intent.
        I think if your main intent is to save someone (and not personal gain), and your plan doesn’t require or seek anyone’s death, then it is deontologically much less bad than evil things like murder. But it may still be too bad for you to do, if you strongly lean towards deontology rather than consequentialism. Even if the court doesn’t find you guilty of first degree murder, it may still find you guilty of… some… things.
        One might argue that the enormous scale (risking everyone’s death instead of only one person), makes it deontologically worse. But I think the balance does not shift in favor of deontology and against consequentialism as we increase the scale (it might even shift a little in favor of consequentialism?).
    - MondSemmel 12 Jun 2025 8:25 UTC
      3 points
      0
      Parent
      That’s fair, but the deontological argument doesn’t work for anyone building the extinction machine who is unconvinced by x-risk arguments, or deludes themselves that it’s not actually an extinction machine, or that extinction is extremely unlikely, or that the extinction machine is the only thing that can prevent extinction (as in all the alignment via AI proposals) etc. etc.
      - Mikhail Samin 12 Jun 2025 13:34 UTC
        6 points
        4
        Parent
        This is not the case for many at Anthropic.
      - Ben Pace 12 Jun 2025 8:30 UTC
        6 points
        5
        Parent
        True; in general, many people who behave poorly do not know that they do so.