I still think that the hot stove example is a real problem, although maybe unavoidable. My example starts with “I learned that the hot stove always burns my hand.” This is not the exploration part anymore, the agent already observed the stove burning its hand many times. Normally, this would be enough to never touch the hot stove again, but if some unexplained nice things happen in the outside world, there is suddenly no guarantee that the IB agent doesn’t start touching the stove again. Maybe this is unavoidable, but I maintain it’s a weird behavior pattern that the outside weather might make you lose the guarantee to not touch the stove you already learned is burning. I think it’s somewhat different than the truly unavoidable thing, that every agent needs to do some exploration and it sometimes leads to them burning their hand.
I still think that the hot stove example is a real problem, although maybe unavoidable. My example starts with “I learned that the hot stove always burns my hand.” This is not the exploration part anymore, the agent already observed the stove burning its hand many times. Normally, this would be enough to never touch the hot stove again, but if some unexplained nice things happen in the outside world, there is suddenly no guarantee that the IB agent doesn’t start touching the stove again. Maybe this is unavoidable, but I maintain it’s a weird behavior pattern that the outside weather might make you lose the guarantee to not touch the stove you already learned is burning. I think it’s somewhat different than the truly unavoidable thing, that every agent needs to do some exploration and it sometimes leads to them burning their hand.