Charlie Steiner answers Should you publish solutions to corrigibility?

Charlie Steiner 30 Jan 2025 13:44 UTC
2 points
−1
Yes. Current AI policy is like people in a crowded room fighting over who gets to hold a bomb. It’s more important to defuse the bomb than it is to prevent someone you dislike from holding it.
That said, we’re currently not near any satisfactory solutions to corrigibility. And I do think it would be better for the world if were easier (by some combination of technical factors and societal factors) to build AI that works for the good of all humanity than to build equally-smart AI that follows the orders of a single person. So yes, we should focus research and policy effort toward making that happen, if we can.
And if we were in that world already, then I agree releasing all the technical details of an AI that follows the orders of a single person would be bad.
- rvnnt 30 Jan 2025 14:30 UTC
  1 point
  0
  Parent
  It’s more important to defuse the bomb than it is to prevent someone you dislike from holding it.
  
  I think there is a key disanalogy to the situation with AGI: The analogy would be stronger if the bomb was likely to kill everyone, but also had a some (perhaps very small) probability of conferring godlike power to whomever holds it. I.e., there is a tradeoff: decrease the probability of dying, at the expense of increasing the probability of S-risks from corrupt(ible) humans gaining godlike power.
  
  If you agree that there exists that kind of tradeoff, I’m curious as to why you think it’s better to trade in the direction of decreasing probability-of-death for increased probability-of-suffering.
  
  So, the question I’m most interested in is the one at the end of the post^[1], viz
  
  What (crucial) considerations should one take into account, when deciding whether to publish—or with whom to privately share—various kinds of corrigibility-related results?
  ↩︎
  Didn’t put it in the title, because I figured that’d be too long of a title.
  - Charlie Steiner 30 Jan 2025 23:19 UTC
    3 points
    −3
    Parent
    I give the probability that some authority figure would use an order-following AI to get torturous revenge on me (probably for being part of a group they dislike) is quite slim. Maybe one in a few thousand, with more extreme suffering being less likely by a few more orders of magnitude? The probablility that they have me killed for instrumental reasons, or otherwise waste the value of the future by my lights, is mich higher—ten percent-ish, depends on my distribution over who’s giving the orders. But this isn’t any worse to me than being killed by an AI that wants to replace me with molecular smiley faces.
    - rvnnt 31 Jan 2025 14:23 UTC
      1 point
      0
      Parent
      To me, those odds each seem optimistic by a factor of about 1000, but ~reasonable relative to each other.
      
      (I don’t see any low-cost way to find out why we disagree so strongly, though. Moving on, I guess.)
      
      But this isn’t any worse to me than being killed [...]
      
      Makes sense (given your low odds for bad outcomes).
      
      Do you also care about minds that are not you, though? Do you expect most future minds/persons that are brought into existence to have nice lives, if (say) Donald “Grab Them By The Pussy” Trump became god-emperor (and was the one deciding what persons/minds get to exist)?