gradient descent analogy breaks down because GD has fine-grained parameter control unlike evolution’s coarse genomic selection
This just misses the point entirely. The problem is that your value function is under-specified by your training data because the space of possible value functions is large and relying on learned inductive biases probably doesn’t save you because value functions are a different kind of thing to predictive models. Whether or not you’re adjusting parameters individually doesn’t change this at all.
This just misses the point entirely. The problem is that your value function is under-specified by your training data because the space of possible value functions is large and relying on learned inductive biases probably doesn’t save you because value functions are a different kind of thing to predictive models. Whether or not you’re adjusting parameters individually doesn’t change this at all.