Musings on Cumulative Cultural Evolution and AI
This post might be interesting to you if you want a conceptual model of cumulative cultural evolution and/or you’re curious how cumulative cultural evolution impacts AI forecasting and development.
In particular, I’ll argue that cumulative cultural evolution makes one argument for the discontinuity of AI progress more plausible and sketches out at least two possible paths of development that are worth further investigation.
Cumulative Cultural Evolution
Humans have altered more than one-third of the earths’ land surface. We cycle more nitrogen than all other terrestrial life forms combined and have now altered the flow of two-thirds of the earth’s rivers. Our species uses 100 times more biomass than any large species that has ever lived. If you include our vast herds of domesticated animals, we account for more than 98% of terrestrial vertebrate biomass. — Joseph Henrich
Cumulative cultural evolution is often framed as an answer to the general question: what makes humans so successful relative to other apes? Other apes do not occupy as many continents, are not as populous, and do not shape or use environment to the extent that humans do.
There are a smorgasbord of different accounts of this, referencing:
human capability for deception ratcheted up human sociality and intelligence via arms race like effects.
human’s general intelligence or improvisational intelligence
a package of instinctual modules, such as language, intelligence, social learning, folk physics and biology
human’s cultural learning abilities
In this section, I’ll explain the account that refers to human’s cultural learning abilities. Cultural learning typically refers to a subset of social learning abilities such as mindreading, imitation, and teaching. I’ll use cultural and social learning interchangeably here.
Muthukrishna, Doebeli, Chudek, and Henrich’s recent paper “The Cultural Brain Hypothesis: How culture drives brain expansion, sociality, and life history” contains one of the best conceptual models of the cumulative cultural evolution story.
In the paper, Muthukrishna and co are in the business of making mathematical models which they can use to simulate the impact of brain size, sociality, mating structures, and life history. I’ll go the conceptual features of the primary model here and motivate it’s plausibility.
The components of the model are:
Muthukrishna and co make the following assumptions:
larger brains are more expensive
larger brains corresponds to an increased capacity to store and manage adaptive information
adaptive information reduces the probability that its bearer will die or increases the probability that it will reproduce.
One way to think about the role of adaptive information is to think of humans as occupying “the cognitive niche.” Just as there are niches that particular species may exploit due to their biological features, a species may exploit a cognitive niche through gathering, managing, and applying information. Humans are apparently unique with regards to the range of information we can gather, the ability to apply that information during development time, and the ability to pass that on that information through generations. Occupying the cognitive niche allows a species like homo sapiens to innovate new tools and manipulate the environment using fire to smart phones.
Managing and storing information requires large brains and large brains are expensive. Large brains are harder to feed, birth, and develop. There are then two conflicting selection pressures: the costliness of large brains and the advantageousness of adaptive information.
To tease out the impact of these different pressures Muthukrishna and co run a move agents through a lifecycle of BIRTH, LEARNING, MIGRATION, and SELECTION. These steps are straightforward, agents are born, spend time learning asocially or socially, migrate between groups, and then are selected according to the amount of adaptive knowledge they acquired, costliness of their brain size, and the environmental payoff.
The model includes the following parameters:
how accurately can an agent learn from others?
how efficient is asocial learning?
what is the environmental payoff for adaptive knowledge?
how much more do those with more adaptive knowledge reproduce? Groups with a pair bonding structure will have less reproductive skew. Groups with a polygynous mating structure will have more reproductive skew.
These parameters are modified for different agents and groups.
What is the result of the simulation? The simulation outputs out the following causal relationships:
Larger brains allow for more adaptive knowledge. This creates a selection pressure for larger brains. This is true for both social and asocial learners.
More adaptive knowledge allows for a larger carrying capacity. As an agents invest in more social learning, this creates a selection pressure for larger groups as larger groups are richer sources of adaptive knowledge than smaller groups.
Larger groups of individuals will tend to have more adaptive knowledge, this puts pressure on longer juvenile periods for social learners.
Extended juvenile periods creates selection pressures for better learning biases, in particular biases concerning who to learn from.
Better learning biases and oblique learning “lead to the realm of cumulative cultural evolution”
Here’s a nifty picture displaying these relationships:
These results are for the most part, weakly, verified in the empirical world.
Brain size and social learning
This relationship appears to hold for primates and birds.
Larger groups & larger brains
This relationship holds for primates, but not for other taxa.
Brain size and juvenile period
There is a positive relationship for primates between brain size and juvenile period.
Group size and juvenile period
There is a positive relationship for primates between groups size and absolute juvenile period.
The general model provides a rather neat picture. Crucially, there are positive feedback loops between larger brains and social learning in the right environment. This in turn pushes towards longer juvenile periods and larger groups.
Cumulative Cultural Evolution
Under the right parameter values, Muthukrishna and co saw that a species which undergoes something like cumulative cultural evolution can be generated. Recall the parameters of the model:
Let a cumulative cultural be the phenomenon where a group contains far more adaptive knowledge than could likely be generated by all of its individual members via asocial learning. Basically, a group that exhibits cumulative culture would be a species where it is exceedingly unlikely that any of its members could generate its pool of adaptive knowledge via asocial learning.
What values of the model parameters would produce cumulative culture?
Before reading on, answer this question on your own.
Very high transmission fidelity.
In order for adaptive knowledge to accumulate in a species that species needs to be able to accurately transmit that knowledge through generations.
Low reproductive skew.
What would happen if there were a high reproductive skew? Then individuals who learn asocially and have especially large brains would be rewarded and survive. However, populations with large brained asocial learners eventually go extinct. This is because as large brained asocial learners survive variance in social learning ability decreases. As variance in social learning ability decreases the population is: (i) unable to cheaply accumulate knowledge via social learning and (ii) unable to transition to smaller brained social learners (remember, brains are expensive!).
Moderate asocial learning.
Social learners face a bootstrapping problem—the adaptive knowledge must come from somewhere. This means that the species needs to be able to generate innovations asocially. However, the species cannot invest in asocial learning too much, otherwise asocial learning may be too efficient and social learning will not take off.
Finally, adaptive knowledge must payoff. Brains are too expensive to grow, unless there is a significant benefit to their becoming larger. The ecology can account for that benefit.
These parameter values offer an explanation of human brain size, social learning capabilities, and general success. Moreover, they may also explain why species like humans are very low in number. In order for there to be a species that invests in social learning at a very high rate:
Ecologies must be sufficiently rich.
There must be a low reproductive skew
A species must be in the goldilocks zone with respect to asocial learning
Transmission fidelity must be very high
In this environment there is significant pressure to:
increase brain size
increase social learning
increase social learning efficiency
I personally think this stuff is very exciting. We have a model for how brain sizes could 3x and how humans can emerge as a uniquely social species. There are additional insights about the value of information and importance of mating structure.
AI Forecasting and Development
Now, what, if anything, is the import of this for AI forecasting and development?
There’s an argument that’s been floating awhile for sometime that goes something like this:
Humans are vastly more successful in certain ways than other hominids, yet in evolutionary time, the distance between them is small. This suggests that evolution induced discontinuous progress in returns, if not in intelligence itself, somewhere approaching human-level intelligence. If evolution experienced this, this suggests that artificial intelligence research may do also. — AI Impacts
One can push back on the argument above with the claims that:
Evolution wasn’t optimizing for improving human intelligence
Human evolutionary success is not due to human intelligence
In response to the first claim, there’s good reason to believe, from the model above, that there are feedback loops that enable a species’ brain size and social learning capabilities to take off. Moreover, evolution was “targeting” this takeoff due the selection pressures for adaptive information and against large brains. Large brain social learners were significantly “rewarded” in the right ecosystems. In the relevant sense, evolutionary dynamics optimized for and pushed humans’ asocial and social learning capabilities upward.
In response to the second claim, whether or not humans are significantly better asocial learners than our primate relatives is unclear. Obviously adult humans have significantly higher intelligence than our primate relatives, however it’s unclear to what extent this higher intelligence is a result of asocial learning rather than social learning. What is clear is that, from a very early age, humans are vastly better social learners than our primate ancestors. Given this, it’s plausible to claim that human success, at least relative to other apes, derived from our asocial and social learning abilities. These abilities enabled humans to occupy the cognitive niche.
Cumulative cultural evolution renders this argument more plausible. However, before getting too excited, it’s worth noting that cumulative cultural evolution is one plausible account of human success and intelligence among an array of plausible accounts.
So much for forecasting, what of artificial intelligence development? The development of human intelligence suggested by the cumulative cultural evolution story, followed the chain below:
moderate asocial learning ⇒ social learning.
One can imagine machine learning work developing sufficient asocial learning techniques and then ratcheting capabilities forward by combining previous work through social learning techniques (potentially via imitation learning or model transfer, but likely much more sophisticated techniques). On this model asocial learning (probably made up by a number of different modules and techniques), enables social learning to become a winning strategy. However, social learning is an independent thing, it is not built on top of asocial learning modules.
Related to this, more work needs to be done determining capacities primarily drive human social learning. Henrich suggests that it is both mindreading and imitation. Tomasello’s work stresses mindreading, in particular the ability for humans to develop joint attention . Answers to this issue would provide at least some evidence about what machine learning algorithms are likely to be more successful.
Another issue brought up by this work is whether social learning is an instinct, that is (roughly) whether it is encoded by genes and not brought into existence via culture, or whether it is a gadget. A gadget is not encoded by genetic information, but is instead developed by cultural means. Suggestive work by Heyes argues that social learning capabilities could be developed from humans’ temperament, high working memory, and attention capabilities. If this is so, then the development of artificial intelligence sketched above is likely flawed. It may instead look like:
temperament + computation power + working memory + attention ⇒ social learning
On this model, not only does asocial learning enable social learning to be a winning strategy, asocial learning capabilities compose social learning abilities. Social learning is really not that different from asocial learning, it just a layer built on top of lower level intelligence systems.
Both of these models suggest that AI development is currently bottlenecked on the asocial learning step. However, once a threshold for asocial learning is reached, intelligence will increase at a vastly quick rate.
There’s a lot more to do here. I hope to have persuasively motivated the cumulative cultural evolution story and the idea that it has important upshots for AI development and forecasting.