Since LeCun’s architecture is together a kind of optimizer (I agree with Algon that it’s probably a utility maximizer) then the emergence of additional mesa-optimizers seems less likely.
We expect optimization to emerge because it’s a powerful algorithm for SGD to stumble on that outcompetes the alternatives. But if the system is already an optimizer, then where is that selection pressure coming from to make another one?
Since LeCun’s architecture is together a kind of optimizer (I agree with Algon that it’s probably a utility maximizer) then the emergence of additional mesa-optimizers seems less likely.
We expect optimization to emerge because it’s a powerful algorithm for SGD to stumble on that outcompetes the alternatives. But if the system is already an optimizer, then where is that selection pressure coming from to make another one?
it’s coming from the fact that every module wants to be an optimizer of something in order to do its job
Interesting, I wonder how the dynamics of a multiple mesa-optimizer system would play out (if it’s possible).