Rob Bensinger comments on A few misconceptions surrounding Roko’s basilisk

Rob Bensinger 9 Oct 2015 19:58 UTC
8 points
0

I think the reason to cooperate is not to get the best personal outcome, but because you care about the other person.

If you have 100% identical consequentialist values to all other humans, then that means ‘cooperation’ and ‘defection’ are both impossible for humans (because they can’t be put in PDs). Yet it will still be correct to defect (given that your decision and the other player’s decision don’t strongly depend on each other) if you ever run into an agent that doesn’t share all your values. See The True Prisoner’s Dilemma.

This shows that the iterated dilemma and the dilemma-with-common-knowledge-of-rationality allow cooperation (i.e., giving up on your goal to enable someone else to achieve a goal you genuinely don’t want them to achieve), whereas loving compassion and shared values merely change goal-content. To properly visualize the PD, you need an actual value conflict—e.g., imagine you’re playing against a serial killer in a hostage negotiation. ‘Cooperating’ is just an English-language label; the important thing is the game-theoretic structure, which allows that sometimes ‘cooperating’ looks like letting people die in order to appease a killer’s antisocial goals.
- Vaniver 9 Oct 2015 20:39 UTC
  2 points
  0
  Parent
  
  To properly visualize the PD, you need an actual value conflict
  
  I think belief conflicts might work, even if the same values are shared. Suppose you and I are at a control panel for three remotely wired bombs in population centers. Both of us want as many people to live as possible. One bomb will go off in ten seconds unless we disarm it, but the others will stay inert unless activated. I believe that pressing the green button causes all bombs to explode, and pressing the red button defuses the time bomb. You believe the same thing, but with the colors reversed. Both of us would rather that no buttons be pressed than both buttons be pressed, but each of us would prefer that just the defuse button be pressed, and that the other person not mistakenly kill all three groups. (Here, attempting to defuse is ‘defecting’ and not attempting to defuse is ‘cooperating’.)
  
  [Edit]: As written, in terms of lives saved, this doesn’t have the property that (D,D)>(C,D); if I press my button, you are indifferent between pressing your button or not. So it’s not true that D strictly dominates C, but the important part of the structure is preserved, and a minor change could make it so D strictly dominates C.
  - bogus 9 Oct 2015 20:50 UTC
    0 points
    0
    Parent
    
    I think belief conflicts might work, even if the same values are shared.
    
    You can solve belief conflicts simply by trading in a prediction market with decision-contingent contracts (a “decision market”). Value conflicts are more general than that.
    - Vaniver 9 Oct 2015 23:00 UTC
      2 points
      0
      Parent
      
      Value conflicts are more general than that.
      
      I think this is misusing the word “general.” Value conflicts are more narrow than the full class of games that have the PD preference ordering. I do agree that value conflicts are harder to resolve than belief conflicts, but that doesn’t make them more general.
- bogus 9 Oct 2015 20:44 UTC
  0 points
  0
  Parent
  
  If you have 100% identical consequentialist values to all other humans, then that means ‘cooperation’ and ‘defection’ are both impossible for humans (because they can’t be put in PDs). … To properly visualize the PD, you need an actual value conflict
  
  True, but the flip side of this is that efficiency (in Coasian terms) is precisely defined as pursuing 100% identical consequentialist values, where the shared “values” are determined by a weighted sum of each agent’s utility function (and the weights are typically determined by agent endowments).