There is one more fact that is ignored in IRL: that humans often have contradicting values. For example, if I want a cake very much, but also have a strong inclination for dieting, I will do nothing. So I have two values, which exactly compensate each other and have zero effects on behaviour. Observing only behaviour will not give a clue about them. More complex examples are possible, where contradicting values create inconsistent behaviour, and it is very typical for biological humans.
There is one more fact that is ignored in IRL: that humans often have contradicting values. For example, if I want a cake very much, but also have a strong inclination for dieting, I will do nothing. So I have two values, which exactly compensate each other and have zero effects on behaviour. Observing only behaviour will not give a clue about them. More complex examples are possible, where contradicting values create inconsistent behaviour, and it is very typical for biological humans.
I’m not sure IRL actually ignores this, although in such a case the value learning agent may never converge on a consistent policy.