I think the really interesting interaction between these two frames is when selection pressures lead to predictive capacities. When does this happen? A first guess might be: when the training (selecting) environment is so complicated, and there is so much local variance that the selective loop finds its easiest to instill a predictive agent and let that take care of the local adaptation.
A lot of stuff works like this: you can have generic chess/math heuristics but you need to be able to do local calculations to not fall flat on your face; evolution more or less works like this in mammals and obviously humans, maybe much more; presumably LLMs work like this; our central nervous system/mind works like this wrt individual cells in the body.
Are there other factors that mediate how a selective process can give rise to local predictive agents? What consequences does this transition have? Cancer/parasites/fraud are three instances of one example, what else?
I have converged (ha!) to similar views recently. I think it is worth trying to make this a lot more precise actually. Let me take a simplified version of a standard ML training set up. So we have some dataset D that samples a subset of all possible inputs A with binary labels in {0,1} and a neural network architecture that defines for you a parameter space and an associated function space F: A to {0,1}. Points of this function space correspond to a “labelling function” on D, and in particular there is a subspace S of F that is the set of “correct functions”, i.e., functions that match on the training set. In general, our optimization algorithms tend to find a point in S always, i.e., they minimize loss on the training set.
Now there is also a test set that is a even smaller subspace T of S. For training to have worked, or to say that the trained net generalizes correctly, what we really mean is that the optimization algorithm finds not just a point in S, but a point in T. So we see that “correct training” is naturally a function that depends on these two nested subspaces (T \subset S). And the power of neural networks is somehow really that they find a much smaller subspace consistently than what training would require (each test data point cuts down the size of the space by around a factor of 2).
Does this help us make more precise your convergent abstraction hypothesis? I think so. I think the key point is that data sets are naturally generated by learners (often humans). So if we have a trained net or a human who can assign labels to data points, we can generate the training data set D by prompting the neural network, and similarly for the test data set.
Then when we train a different neural network on the outputs of the first, for learning to converge behaviorally is to say that they both identify the smaller space T inside S as the “important” one.
I am not sure how legible that was, I am finding this comment box hard to express mathematical ideas in...
===
Anyway, the upshot is that I think this lets us directly compare two learners. If learner A is trained on the outputs of layer B, do they generalize in similar ways? Do they find the right subsets of the function space as the effective target space?