About the guarantees, now that you point it out, the two sentences indeed have different subjects.
About the 3, makes sense that myopia is the most important part
For evaluation vs imitation, I think we might be meaning two different things with richer. I mean that the content of the signal itself has more information and more structure, whether I believe you mean that it applies to more situations and is more general. Is that a good description of your intuition, or am I wrong here?
For the difference between reward learning + maximization and imitation, you’re right, I forgot that most people and systems are not necessarily optimal for their observable reward function. Even if they are, I guess the way the reward generalizes to new environment might differ from the way the imitation differs.
Thanks for the answers.
About the guarantees, now that you point it out, the two sentences indeed have different subjects.
About the 3, makes sense that myopia is the most important part
For evaluation vs imitation, I think we might be meaning two different things with richer. I mean that the content of the signal itself has more information and more structure, whether I believe you mean that it applies to more situations and is more general. Is that a good description of your intuition, or am I wrong here?
For the difference between reward learning + maximization and imitation, you’re right, I forgot that most people and systems are not necessarily optimal for their observable reward function. Even if they are, I guess the way the reward generalizes to new environment might differ from the way the imitation differs.