Inverse reinforcement learning on self, pre-ontology-change

Inverse reinforcement learning is the challenge of constructing a value system that “explains” the behaviour of another agent. Part of the idea is to have algorithms deduce human observations from human behaviours.

It struck me that this could be used by the agent on themselves. Imagine we had a diamond-maximising agent, who believed something like classical Greek “science”, and behaved to accumulate the maximal amount of these shiny crystals. However, they have an ontology change, and learn quantum physics. This completely messes up their view of what a “diamond” is.

However, what if they replayed their previous behaviour, and tried to deduce what possible utility function, in a quantum world, could explain what they had done? They would be trying to fit a quantum-world-aware utility to the decisions of a non-quantum-world-aware being.

This could possibly result in a useful extension of the original motivation to the new setup (at least, it would guarantee similar behaviour in similar circumstances). There are many challenges—most especially that a quantum-aware being has far more knowledge about how to affect the world, and thus far more options—but they seem the usual sort of inverse reinforcement learning (partial knowledge, noise, etc...)