MSRayne comments on [Intro to brain-like-AGI safety] 5. The “long-term predictor”, and TD learning

MSRayne 16 Jun 2022 19:12 UTC
1 point
0
My vague, uneducated intuition on the matter is that it has something to do with surprise. More specifically, that a pleasant event that is unexpected is intrinsically higher valence / more rewarding, for some reason, than a pleasant event that is expected. I don’t know why this would be the case or how it works in the brain but it fits with my life experience pretty well and likely yours too. (In the same way, an unexpected bad event feels far worse than an expected bad event in most cases.)
Then a fixed rate schedule is such that the entity will quickly learn to predict each reward and will find it less rewarding—meanwhile in a variable rate schedule, the rewards are harder to predict and thus more compelling.
But that just pushes the question backwards a bit: why is unpredictability of an event a multiplicative factor in the equation determining its reward, magnifying highs and lows? What evolutionary purpose does that serve if it is true and how is it implemented in the brain? I’m not sure.
Hmm, maybe this (if accurate) is how curiosity and risk-aversion are implemented? Heck, maybe they’re both the same drive, an emergent result of this amplification that uncertainty hypothetically causes: since unexpected rewards are more rewarding, entities will seek out environments in which unexpected good events are more likely to occur, e.g. novel environments (but not so novel that they are predicted to be unsafe) - meanwhile, entities will avoid environments in which unexpected bad events are likely to occur, and will tend to minimize risk. (Meaning that your prediction about the valence of novel things in general has a large effect on whether it is more or less compelling than familiar things, leading to the balance of sensitivities between good versus bad surprises being a hyperparameter perhaps differing between individuals—bear versus bull etc.) But that’s all just conjecture.