Dacyn comments on Can a pre-commitment to not give in to blackmail be “countered” by a pre-commitment to ignore such pre-commitments?

Dacyn 5 Jul 2025 20:24 UTC
1 point
0
It all depends on what you mean by “sufficiently intelligent / coherent actors”. For example, in this comment Eliezer says that it should mean actors that “respond to offers, not to threats”, but in 15 years no one has been able to cash out what this actually means, AFAIK.
- Viliam 13 Jul 2025 19:16 UTC
  6 points
  4
  Parent
  I don’t have a full answer, but my intuition goes roughly in a direction of “what would the other person do if they absolutely couldn’t communicate with you, not even indirectly (e.g. you couldn’t learn about how they interact with others)?”
  If they would leave you alone (i.e. not cause the outcome that you want to avoid), then their action is definitely blackmail. The only reason they attempt to do that thing is to elicit your reaction.
- Sappique 6 Jul 2025 8:43 UTC
  3 points
  0
  Parent
  As far as I can tell from Eliezer’s writing (mostly Planecrash), a threat is when someone will (counterfactually) purposefully minimize someone else’s utility function.
  
  So releasing blackmail material would be a threat, but building a road through someone else’s home (if doing so offers slightly more utility then going around) wouldn’t be?
  
  Actors could pre-commit to ignore any counterfactuals where someone purposefully minimizes their utility function, but then again would-be blackmailers could pre-commit to ignore such pre-commitments.
  
  Maybe pre-commiting to ignore threats is a kind of “pre-commitment shelling point”, that works if everyone does it? If all actors coordinated (even by just modeling other actors and without communication) to pre-commit to ignore threats, the would-be extorters accept that?
  - Dacyn 6 Jul 2025 21:15 UTC
    3 points
    2
    Parent
    Yeah, but what does “purposefully minimize someone else’s utility function” mean? The source code just does stuff. What does it mean for it to be “on purpose”?
    - Sappique 7 Jul 2025 10:25 UTC
      1 point
      0
      Parent
      I believe “on purpose” in this case means, doing something conditional on the other actor’s utility function disvaluing it.
      
      So if you build a interstellar highway through someone’s planet because that is the fastes route, you are not “purposefully minimizing their utility function”, even if they strongly disvalue it. If you build it through their planet only if they disvalue it and would have build it around if they disvalued that, then you are “purposefully minimizing their utility function”.
      
      If you do so to prevent them from having a planet or to make them react in some (useful to you) way, and would have done so even if they didn’t have disvalued their planet being destroyed, then you are not “purposefully minimizing their utility function”, I think?
      - Dacyn 7 Jul 2025 18:32 UTC
        6 points
        2
        Parent
        Let’s talk about a specific example: the Ultimatum Game. According to EY the rational strategy for the responder in the Ultimatum Game is to accept if the split is “fair” and otherwise reject in proportion to how unfair he thinks the split is. But the only reason to reject is to penalize the proposer for proposing an unfair split—which certainly seems to be “doing something conditional on the other actor’s utility function disvaluing it”. So why is the Ultimatum Game considered an “offer” and not a “threat”?
        Sappique 7 Jul 2025 21:15 UTC
        1 point
        0
        Parent
        Good question.
        
        I can’t tell, if saying that you will reject unfair splits would be a threat by the definition in my above comment. For it to be a threat, you would have to only do it if the other person cares about the thing being split. But in the Ultimatum Game both players per definition care about it, so I have a hard time thinking about what you would do if someone offers you a unfair split of something they don’t care about (how can a split even be unfair, if only one person values the thing being split?).