Ben Pace comments on Mikhail Samin’s Shortform

Ben Pace 16 Jun 2025 0:23 UTC
21 points
24
If I understand you correctly, you are arguing that it makes sense to follow deontological rules, even if there’s a really good reason breaking them seems locally beneficial, because on average, the decision theory that’s willing to do harmful things for complex reasons performs badly.
Hm… I would say that one should follow deontological rules like “don’t lie” and “don’t steal” and so on because we fail to understand or predict the knock-on consequences. For instance they can get the world into a much worse equilibrium of mutual liars/stealers, for instance, in ways that are hard to predict. And being a good person can get the world into a much better equilibrium of mutually-honorable people in ways that are hard to predict. And also because, if it does screw up in some hard to predict way, then when you look back, it will often be the easiest line in the sand to draw.

For instance, if SBF is wondering at what point he could have most reliably intervened on his whole company collapsing and ruining the reputation of things associated with it, he might talk about certain deals he made or strategic plays with Binance or the US Govt, for he is not a very ethical person; I would talk about not taking customer deposits.
If and when we get to an endgame where tons of AI systems are sociopathically lying and stealing money and ultimately killing the humans, I suspect people of SBF’s mindset again to talk about how the US and China should’ve played things, or how Musk should’ve played OpenAI, and how Amodei should’ve done played with DC. And I will talk about not racing to develop the unaligned AI systems in the first place.
To me, the point where things break down with standard deontological reasoning is that this is just very outside the context where such priors were developed and have proven to be robust. I am not comfortable naively assuming they will generalize, and I think this is an incredibly high stakes thing where far and away the only thing I care about is taking the actions that will actually, in practice, lead to a lower probability of extinction.
I don’t really know why you think that this generalization can’t be made to things we’ve not seen before. So many things I experience haven’t been seen before in history. How many centuries have we had to develop ethical intuitions for how to write on web forums? There are still answers to these questions, and I can identify ethical and unethical behaviors, as can you (e.g. sockpuppeting, doxing, brigading, etc). There can be ethical lines in novel situations, not only historically common ones.
Another example is nuclear weapons. From a certain perspective, holding nuclear weapons is highly unethical as it risks nuclear winter, whether from provoking someone else or from a false alarm on your side. While I’m strongly in favour of countries unilaterally switching to a no-first-use policy and pursuing mutual disarmament, I am not in favour of countries unilaterally disarming themselves. By my interpretation of your proposed ethical rules, this suggests countries should unilaterally disarm. Do you agree with that? If not, what’s disanalogous?
I am not sure what I would propose if I believed Nuclear Winter was a serious existential threat; it seems plausible to me that the ethical thing would be to unilaterally disarm. I suspect that at the very least if I were a country I would openly and aggressively campaign for mutual disarmament. (If any AI capabilities company openly campaigned for making it illegal to develop AI then I suspect I would consider that plausibly quite ethical).
I’m purely defending the abstract point of “plans that could result in increased human extinction, even if by building the doomsday machine yourself, are not automatically ethically forbidden”.
To be clear, I think you’re defending a somewhat stronger claim. You write further up thread:
I am not trying to defend the claim that I am highly confident that what Anthropic is doing is ethical and net good for the world, but I am trying to defend the claim that there are vaguely similar plans to Anthropics that I would predict are net good in expectation, e.g., becoming a prominent actor then leveraging your influence to push for good norms and good regulations. Your arguments would also imply that plans like that should be deontologically prohibited and I disagree.
My current stance is that all actors currently in this space are doing things prohibited by basic deontology. This is not merely an unfortunate outcome, but is a grave sin, for they are building doomsday machines, likely the greatest evil that we will ever experience in our history (regardless of if they are successful). So I want to emphasize that the boundary here is not between “better and worse plans” but between “moral murky and morally evil plans”. Insofar as you commit a genocide or worse, history should remember your names as people of shame who we must take pain never to repeat. Insofar as you played with the idea, thought you could control it, and failed, then history should still think of you this way.
I believe we disagree over where the deontological lines are, given you are defending “vaguely similar plans to Anthropic’s”. Perhaps you could point to where you think they are? Presumably you think that a Larry Page style “this is just the next stage in evolution” indifference to human extinction AI-project would be morally wrong?
Here’s two lines that I think might cross into being acceptable [edit: or rather, “only morally murky”] from my perspective.
I think it might be appropriate to risk building a doomsday machine if, loudly and in-public, you told everyone “I AM BUILDING A POTENTIAL DOOMSDAY MACHINE, AND YOU SHOULD SHUT MY INDUSTRY DOWN. IF YOU DON’T THEN I WILL RIDE THIS WAVE AND ATTEMPT TO IMPROVE IT, BUT YOU REALLY SHOULD NOT LET ANYONE DO WHAT I AM DOING.” And was engaged in serious lobbying and advertising efforts to this effect.
I think it could possibly be acceptable to build an AI capabilities company if you committed to never releasing or developing any frontier capabilities AND if all employees also committed not to leave and release frontier capabilities elsewhere AND you were attempting to use this to differential improve society’s epistemics and awareness of AI’s extinction-level threat. Though this might still cause too much economic investment into AI as an industry, I’m not sure.
I of course do not think any current project looks superficially like these.
- Neel Nanda 16 Jun 2025 1:11 UTC
  1 point
  −1
  Parent
  
  Here’s two lines that I think might cross into being acceptable from my perspective.
  
  I think it might be appropriate to risk building a doomsday machine if, loudly and in-public, you told everyone “I AM BUILDING A POTENTIAL DOOMSDAY MACHINE, AND YOU SHOULD SHUT MY INDUSTRY DOWN. IF YOU DON’T THEN I WILL RIDE THIS WAVE AND ATTEMPT TO IMPROVE IT, BUT YOU REALLY SHOULD NOT LET ANYONE DO WHAT I AM DOING.” And was engaged in serious lobbying and advertising efforts to this effect.
  
  I think it could possibly be acceptable to build an AI capabilities company if you committed to never releasing or developing any frontier capabilities AND if all employees also committed not to leave and release frontier capabilities elsewhere AND you were attempting to use this to differential improve society’s epistemics and awareness of AI’s extinction-level threat. Though this might still cause too much economic investment into AI as an industry, I’m not sure.
  
  I of course do not think any current project looks superficially like these.
  
  Okay, after reading this it seems to me that we broadly do agree and are just arguing over price. I’m arguing that it is permissible to try to build a doomsday machine if there are really good reasons to believe it is net good for the probability of doomsday. It sounds like you agree, and give two examples of what “really good reasons” could be. I’m sure we disagree on the boundaries of where the really good reasons lie, but I’m trying to defend the point that you actually need to think about the consequences.
  
  What am I missing? Is it that you think these two are really good reasons, not because of the impact on the consequences, but because of the attitude/framing involved?
  - ryan_greenblatt 16 Jun 2025 3:57 UTC
    17 points
    18
    Parent
    I’m not Ben, but I think you don’t understand. I think explaining what you are doing loudly in public isn’t like “having a really good reason to believe it is net good” is instead more like asking for consent.
    
    Like you are saying “please stop me by shutting down this industry” and if you don’t get shut down, that it is analogous to consent: you’ve informed society about what you’re doing and why and tried to ensure that if everyone else followed a similar sort of policy we’d be in a better position.
    
    (Not claiming I agree with Ben’s perspective here, just trying to explain it as I understand it.)
    - Neel Nanda 16 Jun 2025 11:40 UTC
      4 points
      0
      Parent
      Ah! Thanks a lot for the explanation, that makes way more sense, and is much weaker than what I thought Ben was arguing for. Yeah this seems like a pretty reasonable position, especially “take actions where if everyone else took them we would be much better off” and I am completely fine with holding Anthropic to that bar. I’m not fully sold re the asking for consent framing, but mostly for practical reasons—I think there’s many ways that society is not able to act constantly, and the actions of governments on many issues are not a reflection of the true informed will of the people, but I expect there’s some reframe here that I would agree with.
      - habryka 16 Jun 2025 16:45 UTC
        2 points
        0
        Parent
        and is much weaker than what I thought Ben was arguing for.
        I don’t think Ryan (or I) was intending to imply a measure of degree, so my guess is unfortunately somehow communication still failed. Like, I don’t think Ryan (or Ben) are saying “it’s OK to do these things you just have to ask for consent”. Ryan was just trying to point out a specific way in which things don’t bottom out in consequentialist analysis.
        If you end up walking away with thinking that Ben believes “the key thing to get right for AI companies is to ask for consent before building the doomsday machine”, which I feel like is the only interpretation of what you could mean by “weaker” that I currently have, then I think that would be a pretty deep misunderstanding.
        Neel Nanda 16 Jun 2025 18:50 UTC
        4 points
        0
        Parent
        OK, I’m going to bow out of the conversation at this point, I’d guess further back and forth won’t be too productive. Thanks all!
        Ben Pace 17 Jun 2025 6:49 UTC
        4 points
        2
        Parent
        There is something important to me in this conversation about not trusting one’s consequentialist analysis when evaluating proposals to violate deontological lines, and from my perspective you still haven’t managed to paraphrase this basic ethical idea or shown you’ve understood it, which I feel a little frustrated over. Ah well. I still have been glad of this opportunity to argue it through, and I feel grateful to Neel for that.