one obviously true consideration i failed to raise was that neural networks change lots of their weights at a time per update. this is in contrast to natural selection which can only change one thing at a time. this means that gradient descent lacks evolution’s property of every change to the structure in question needing to be useful in its own right if it’s going to spread through the population. therefore, deep learning systems could build complex algorithms requiring multiple computational steps before becoming useful in a way that evolution couldn’t. this probably gives it access to a broader class of algorithms it can implement, potentially including dangerous mesa-optimizers.
i still think llms updates being not being generated for generality is a significant reason for hope though
I’m not sure that’s quite right. A genetic mutation is “one thing” but it can easily have many different effects especially once you consider that it’s active for an entire lifetime.
And doesn’t gradient descent also demand that each weight update is beneficial? At a much more fine grain than evolution does… then again I guess there’s grokking so I’m not sure.
I’m not sure that’s quite right. A genetic mutation is “one thing” but it can easily have many different effects especially once you consider that it’s active for an entire lifetime.
Wile this can happen, empirically speaking (at least for eukaryotes), genetic mutations are mostly much more modular and limited to one specific thing by default, rather than it affecting everything else in a tangled way, and this is due to genetics research discovering that things are mostly linear and compositional for genetic effects, in the sense that the best way to predict what will happen if you add two genes together is that their effects are summed up, not interacting in a nonlinear way.
one obviously true consideration i failed to raise was that neural networks change lots of their weights at a time per update. this is in contrast to natural selection which can only change one thing at a time. this means that gradient descent lacks evolution’s property of every change to the structure in question needing to be useful in its own right if it’s going to spread through the population. therefore, deep learning systems could build complex algorithms requiring multiple computational steps before becoming useful in a way that evolution couldn’t. this probably gives it access to a broader class of algorithms it can implement, potentially including dangerous mesa-optimizers.
i still think llms updates being not being generated for generality is a significant reason for hope though
I’m not sure that’s quite right. A genetic mutation is “one thing” but it can easily have many different effects especially once you consider that it’s active for an entire lifetime.
And doesn’t gradient descent also demand that each weight update is beneficial? At a much more fine grain than evolution does… then again I guess there’s grokking so I’m not sure.
Wile this can happen, empirically speaking (at least for eukaryotes), genetic mutations are mostly much more modular and limited to one specific thing by default, rather than it affecting everything else in a tangled way, and this is due to genetics research discovering that things are mostly linear and compositional for genetic effects, in the sense that the best way to predict what will happen if you add two genes together is that their effects are summed up, not interacting in a nonlinear way.