$100/​$50 rewards for good references

With thanks to Rohin Shah.

Dear LessWrongers, this is an opportunity to make money and help with AI alignment.

We’re looking for specific AI capabilities; has anyone published on the following subject:

  • Generating multiple reward functions or policies from the same set of challenges. Has there been designs for deep learning or similar, in which the agent produces multiple independent reward functions (or policies) to explain the same reward function or behaviour?

For example, in CoinRun, the agent must get to the end of the level, on the right, to collect the coin. It only gets the reward for collecting the coin.

That is the “true” reward, but, since the coin is all the way to the right, as far as the agent knows, “go to the far right of the level” could just as well have been the true reward.

We’d want some design that generated both these reward functions (and, in general, generated multiple reward functions when there are several independent candidates). Alternatively, they might generate two independent policies—we could test these by putting the coin in the middle of the level and seeing what the agent decided to do.

We’re not interested in a Bayesian approach that lists a bunch of reward functions and then updates to include just those two (that’s trivially easy to do). Nor are we interested in an IRL-style approach that lists “features”, including the coin and the right hand side.

What we’d want is some neural-net style design that generates the coin reward and the move-right reward just from the game data, without any previous knowledge of the setting.

So, does anyone know any references for that kind of work?

We will pay $50 for the first relevant reference submitted, and $100 for the best reference.

Thanks!