Your paper is about an agent which can perform well in any possible universe. (That’s the “for all ν in ℳ”). That includes universes where the laws of physics suddenly change tomorrow. But in real life, I know that the laws of physics are not going to change tomorrow. Thus, I can get optimal results without doing the kind of exhaustive exploration that your paper is talking about. Agree or disagree?
Certainly for the true environment, the optimal policy exists and you could follow it. The only thing I’d say differently is that you’re pretty sure the laws of physics won’t change tomorrow. But more realistic forms of uncertainty doom us to either forego knowledge (and potentially good policies) or destroy ourselves. If one slowed down science in certain areas for reasons along the lines of the vulnerable world hypothesis, that would be taking the “safe stance” in this trade off.
A decent intuition might be to think about what exploration looks like in human children. Children under the age of 5 but old enough to move about on their own—so toddlers, not babies or “big kids”—face a lot of dangers in the modern world if they are allowed to run their natural exploration algorithm. Heck, I’m not even sure this is a modern problem, because in addition to toddlers not understanding and needing to be protected from exploring electrical sockets and moving vehicles they also have to be protected from more traditional dangers that they would definitely otherwise check out like dangerous plants and animals. Of course, since toddlers grow up into powerful adult humans, this is a kind of evidence that they are powerful enough explorers (even with protections) to become powerful enough to function in society.
Obviously there are a lot of caveats to taking this idea too seriously since I’ve ignored issues related to human development, but I think it points in the right direction of something everyday that reflects this result.
Well, nothing in the paper has to do with MDPs! The results are for general computable environments. Does that answer the question?
Hmm, I think I get it. Correct me if I’m wrong.
Your paper is about an agent which can perform well in any possible universe. (That’s the “for all ν in ℳ”). That includes universes where the laws of physics suddenly change tomorrow. But in real life, I know that the laws of physics are not going to change tomorrow. Thus, I can get optimal results without doing the kind of exhaustive exploration that your paper is talking about. Agree or disagree?
Certainly for the true environment, the optimal policy exists and you could follow it. The only thing I’d say differently is that you’re pretty sure the laws of physics won’t change tomorrow. But more realistic forms of uncertainty doom us to either forego knowledge (and potentially good policies) or destroy ourselves. If one slowed down science in certain areas for reasons along the lines of the vulnerable world hypothesis, that would be taking the “safe stance” in this trade off.
Thanks!
A decent intuition might be to think about what exploration looks like in human children. Children under the age of 5 but old enough to move about on their own—so toddlers, not babies or “big kids”—face a lot of dangers in the modern world if they are allowed to run their natural exploration algorithm. Heck, I’m not even sure this is a modern problem, because in addition to toddlers not understanding and needing to be protected from exploring electrical sockets and moving vehicles they also have to be protected from more traditional dangers that they would definitely otherwise check out like dangerous plants and animals. Of course, since toddlers grow up into powerful adult humans, this is a kind of evidence that they are powerful enough explorers (even with protections) to become powerful enough to function in society.
Obviously there are a lot of caveats to taking this idea too seriously since I’ve ignored issues related to human development, but I think it points in the right direction of something everyday that reflects this result.
The last paragraph of the conclusion (maybe you read it?) is relevant to this.