As I mentioned in the conclusion, I hope to write more in the near future about how (and if) this pessimistic argument breaks down for certain non-behaviorist reward functions.
But to be clear, the pessimistic argument also applies perfectly well to at least some non-behaviorist reward functions, e.g. curiosity drive. So I partly agree with you.
As I mentioned in the conclusion, I hope to write more in the near future about how (and if) this pessimistic argument breaks down for certain non-behaviorist reward functions.
But to be clear, the pessimistic argument also applies perfectly well to at least some non-behaviorist reward functions, e.g. curiosity drive. So I partly agree with you.