jimrandomh comments on An Idea For Corrigible, Recursively Improving Math Oracles

jimrandomh 20 Jul 2015 3:28 UTC
LW: 3 AF: 1
0
AF
Thanks Anna Salamon for the idea of making an AI which cares about what happens in a counterfactual ideal world, rather than the real world world with the transistors in it, as a corrigibility strategy. I haven’t yet been able to find a way to make that idea work for an agent/utility maximizer, but it inspired the idea of doing the same thing in an oracle.
- Stuart_Armstrong 18 Aug 2015 11:34 UTC
  0 points
  0
  AF Parent
  You could have an agent that cares about what an idealised counterfactual human would think about its decisions (if the idealised human had a huge amount of time to think them over). Compare with Paul Christiano’s ideas.
  
  Now, this isn’t safe, but it’s at least something you might be able to play with.