MSRayne comments on Half-baked AI Safety ideas thread

MSRayne 24 Jun 2022 15:11 UTC
6 points
0
What if an AI was rewarded for being more predictable to humans? Give it a primary goal—make more paperclips! - but also a secondary goal, to minimize the prediction error of the human overseers, with its utility function being defined as the minimum of these two utilities. This is almost certainly horribly wrong somehow but I don’t know how. The idea though is that the AI would not take actions that a human could not predict it would take. Though, if the humans predicted it would try to take over the world, that’s kind of a problem… this idea is more like a quarter baked than a half lol.