Nisan comments on April 15, 2040

Nisan 5 May 2021 6:28 UTC
6 points
0
Yeah, I would be very nervous about making an exception to my assistant’s corrigibility. Ultimately, it would be prudent to be able to make some hard commitments after thinking very long and carefully about how to do that. In the meantime, here are a couple corrigibility-preserving commitment mechanisms off the top of my head:
- Escrow: Put resources in a dumb incorrigible box that releases them under certain conditions.
- The AI can incorrigibly make very short-lived commitments during atomic actions (like making a purchase).
Are these enough to maintain competitiveness?
- adamShimi 5 May 2021 6:53 UTC
  2 points
  0
  Parent
  This seems like a role for the law. Like having corrigibility except for breaking the law. I find that reasonable at first hand, but I also know relatively little about law in different countries to understand how uncompetitive that would make the AIs.
  
  (There’s also a risk of giving too much power to the legislative authority in your country, if you’re worried about that kind of thing)
  
  Although I could imagine something like a modern day VPN allowing you to make your AI believe it’s in another country, to make it do something illegal where you are. That’s bad in a country with useful laws and good in a country with an authoritarian regime.