gwern comments on When is reward ever the optimization target?

gwern 12 Jan 2025 22:54 UTC
15 points
2
Yes. (And they can learn to predict and estimate the reward too to achieve even higher reward than simply optimizing the reward. For example, if you included an input, which said which arm had the reward, the RNN would learn to use that, and so would be able to change its decision without experiencing a single negative reward. A REINFORCE or evolution-strategies meta-trained RNN would have no problem with learning such a policy, which attempts to learn or infer the reward each episode in order to choose the right action.)

Nor is it at all guaranteed that ‘the dog will wag the tail’ - depending on circumstances, the tail may successfully wag the dog indefinitely. Maybe the outer level will be able to override the inner, maybe not. Because after all, the outer level may no longer exist, or may be too slow to be relevant, or may be changed (especially by the inner level). The ‘homunculus’ or ‘Cartesian boundary’ we draw around each level doesn’t actually exist; it’s just a convenient, leaky, abstraction.

To continue the human example, we were created by evolution on genes, but within a lifetime, evolution has no effect on the policy and so even if evolution ‘wants’ to modify a human brain to do something other than what that brain does, it cannot operate within-lifetime (except at even lower levels of analysis, like in cancers or cell lineages etc); or, if the human brain is a digital emulation of a brain snapshot, it is no longer affected by evolution at all; and even if it does start to mold human brains, it is such a slow high-variance optimizer that it might take hundreds of thousands or millions of years… and there probably won’t even be biological humans by that point, never mind the rapid progress over the next 1-3 generations in ‘seizing the means of reproduction’ if you will. (As pointed out in the context of Von Neumann probes or gray goo, if you add in error-correction, it is entirely possible to make replication so reliable that the universe will burn out before any meaningful level of evolution can happen, per the Price equation. The light speed delay to colonization also implies that ‘cancers’ will struggle to spread much if they take more than a handful of generations.)