Stuart_Armstrong comments on An Idea For Corrigible, Recursively Improving Math Oracles

Stuart_Armstrong 18 Aug 2015 11:34 UTC
0 points
0
AF
You could have an agent that cares about what an idealised counterfactual human would think about its decisions (if the idealised human had a huge amount of time to think them over). Compare with Paul Christiano’s ideas.

Now, this isn’t safe, but it’s at least something you might be able to play with.