You could have an agent that cares about what an idealised counterfactual human would think about its decisions (if the idealised human had a huge amount of time to think them over). Compare with Paul Christiano’s ideas.
Now, this isn’t safe, but it’s at least something you might be able to play with.
You could have an agent that cares about what an idealised counterfactual human would think about its decisions (if the idealised human had a huge amount of time to think them over). Compare with Paul Christiano’s ideas.
Now, this isn’t safe, but it’s at least something you might be able to play with.