I have a hunch that the Grue Bleen Problem for Reward Functions will start to become less complicated as “reward function inference” becomes more tractable.
At an aggregate level, the need to model preferences is already happening on a large scale via personalization of recommendation engines, and at a small scale it comes up with text interpretation (like reference co-resolution) in the paradigm where words are taken to be moves in a cooperative game where two communicating agents “want to pay attention to the same things” but also “want to pay attention to the things they already care about”.
As practical methods for modeling human preferences evolve under engineering constraints, I suspect that it will become easier to talk about how these preferences change (or do not change) in very concrete ways.
I have a hunch that the Grue Bleen Problem for Reward Functions will start to become less complicated as “reward function inference” becomes more tractable.
At an aggregate level, the need to model preferences is already happening on a large scale via personalization of recommendation engines, and at a small scale it comes up with text interpretation (like reference co-resolution) in the paradigm where words are taken to be moves in a cooperative game where two communicating agents “want to pay attention to the same things” but also “want to pay attention to the things they already care about”.
As practical methods for modeling human preferences evolve under engineering constraints, I suspect that it will become easier to talk about how these preferences change (or do not change) in very concrete ways.