Good question! The idea is, the brain is supposed to do something specific and useful—run a certain algorithm that systematically leads to ecologically-adaptive actions. The size of the genome limits the amount of complexity that can be built into this algorithm. (More discussion here.) For sure, the genome could build a billion different “cell types” by each cell having 30 different flags which are on and off at random in a collection of 100 billion neurons. But … why on earth would the genome do that? And even if you come up with some answer to that question, it would just mean that we have the wrong idea about what’s fundamental; really, the proper reverse-engineering approach in that case would be to figure out 30 things, not a billion things, i.e. what is the function of each of those 30 flags.
A kind of exception to the rule that the genome limits the brain algorithm complexity is that the genome can (and does) build within-lifetime learning algorithms into the brain, and then those algorithms run for a billion seconds, and create a massive quantity of intricate complexity in their “trained models”. To understand why an adult behaves how they behave in any possible situation, there are probably billions of things to be reverse-engineered and understood, rather than low-thousands of things. However, as a rule of thumb, I claim that:
when the evolutionary learning algorithm adds a new feature to the brain algorithm, it does so by making more different idiosyncratic neuron types and synapse types and neuropeptide receptors and so on,
when one of the brain’s within-lifetime learning algorithm adds a new bit of learned content to its trained model, it does so by editing synapses.
Again, I only claim that these are rules-of-thumb, not hard-and-fast rules, but I do think they’re great starting points. Even if there’s a nonzero amount of learned content storage via gene expression, I propose that thinking of it as “changing the neuron type” is not a good way to think about it; it’s still “the same kind of neuron”, and part of the same subproject of the “understanding the brain” megaproject, it’s just that the neuron happens to be storing some adjustable parameter in its nucleus and acting differently in accordance with that.
By contrast, medium spiny neurons versus Purkinje cells versus cortical pyramidal neurons versus magnocellular neurosecretory cells etc. etc. are all just wildly different from each other—they look different, they act different, they play profoundly different roles in the brain algorithm, etc. The genome clearly needs to be dedicating some of its information capacity to specifying how to build each and every of those cell types, individually, such that each of them can play its own particular role in the brain algorithm.
Thanks for the reply! I’m familiar with (and am skeptical of) the basic information theoretic argument as to why genome size should constrain the complexity of whatever algorithm the brain is running, but my question here is more specific. What I’m not clear on is how those two numbers (20,000 genes and a few thousand neuron types) specifically relate to each other in your model of brain functioning. Is the idea that each neuron type roughly corresponds to the expression of one or two specific genes, and thus you’d expect <20,000 neuron types?
“For sure, the genome could build a billion different “cell types” by each cell having 30 different flags which are on and off at random in a collection of 100 billion neurons. But … why on earth would the genome do that?”
Of course in an emulation you could probably just tell the neurons to not interact with themselves so this crazy system wouldn’t be necessary, but it is a nice example of how biology does things you might a priori think would never happen.
What I’m not clear on is how those two numbers (20,000 genes and a few thousand neuron types) specifically relate to each other in your model of brain functioning.
Start with 25,000 genes, but then reduce it a bunch because they also have to build hair follicles and the Golgi apparatus and on and on. But then increase it a bit too because each gene has more than one design degree of freedom, e.g. a protein can have multiple active sites, and there’s some ability to tweak which molecules can and cannot reach those active sites and how fast etc. Stuff like that.
Putting those two factors together, I dunno, I figure it’s reasonable to guess that the genome can have a recipe for a low-thousands of distinct neuron types each with its own evolutionarily-designed properties and each playing a specific evolutionarily-designed role in the brain algorithm.
And that “low thousands” number is ballpark consistent with the slide-seq thing, and also ballpark consistent with what you get by counting the number of neuron types in a random hypothalamus nucleus and extrapolating. High hundreds, low thousands, I dunno, I’m treating it as a pretty rough estimate.
Hmm, I guess when I think about it, the slide-seq number and the extrapolation number are probably more informative than the genome number. Like, can I really rule out “tens of thousands” just based on the genome size? Umm, not with extreme confidence, I’d have to think about it. But the genome size is at least a good “sanity check” on the other two methods.
Is the idea that each neuron type roughly corresponds to the expression of one or two specific genes, and thus you’d expect <20,000 neuron types?
No, I wouldn’t necessarily expect something so 1-to-1. Just the general information theory argument. If you have N “design degree of freedom” and you’re trying to build >>N specific machines that each does a specific thing, then you get stuck on the issue of crosstalk.
For example, suppose that some SNP changes which molecules can get to the active site of some protein. It makes Purkinje cells more active, but also increases the ratio of striatal matrix cells to striosomes, and also makes auditory cortex neurons more sensitive to oxytocin. Now suppose there’s very strong evolutionary pressure for Purkinje cells to be more active. Then maybe that SNP is going to spread through the population. But it’s going to have detrimental side-effects on the striatum and auditory cortex. Ah, but that’s OK, because there’s a different mutation to a different gene which fixes the now-suboptimal striatum, and yet a third mutation that fixes the auditory cortex. Oops, but those two mutations have yet other side-effects on the medulla and … Etc. etc.
…Anyway, if that’s what’s going on, that can be fine! Evolution can sort out this whole system over time, even with crazy side-effects everywhere. But only as long as there are enough “design degrees of freedom” to actually fix all these problems simultaneously. There do have to be more “design degrees of freedom” in the biology / genome than there are constraints / features in the engineering specification, if you want to build a machine that actually works. There doesn’t have to be a 1-to-1 match between design-degrees-of-freedom and items on your engineering blueprint, but you do need that inequality to hold. See what I mean?
Interestingly, the genome does do this! Protocadherins in vertebrates and DSCAM1 are expressed in exactly this way, and it’s thought to help neurons to distinguish themselves from other neurons…
Of course in an emulation you could probably just tell the neurons to not interact with themselves
Cool example, thanks! Yeah, that last part is what I would have said. :)
Interesting...I think I vaguely understand what you’re talking about, but I’m doubtful that these concepts really apply to biology. Especially since your example is about constraints on evolvability rather than functioning. In practice that is pretty much how everything tends to work, with absolutely wild amounts of pleiotropy and epistasis, but that’s not a problem unless you want to evolve a new function. Which is probably why the strong strong evolutionary default is towards stasis, not change.
I guess my priors are pretty different because my background is in virology, where our expectation (after decades of painful lessons) is that the default is for proteins to be wildly multifunctional, with many many many “design degrees of freedom.” Granted viruses are a bit of a special case, but I do think they can provide a helpful stress test/simpler model for information theoretic models of genome function.
Good question! The idea is, the brain is supposed to do something specific and useful—run a certain algorithm that systematically leads to ecologically-adaptive actions. The size of the genome limits the amount of complexity that can be built into this algorithm. (More discussion here.) For sure, the genome could build a billion different “cell types” by each cell having 30 different flags which are on and off at random in a collection of 100 billion neurons. But … why on earth would the genome do that? And even if you come up with some answer to that question, it would just mean that we have the wrong idea about what’s fundamental; really, the proper reverse-engineering approach in that case would be to figure out 30 things, not a billion things, i.e. what is the function of each of those 30 flags.
A kind of exception to the rule that the genome limits the brain algorithm complexity is that the genome can (and does) build within-lifetime learning algorithms into the brain, and then those algorithms run for a billion seconds, and create a massive quantity of intricate complexity in their “trained models”. To understand why an adult behaves how they behave in any possible situation, there are probably billions of things to be reverse-engineered and understood, rather than low-thousands of things. However, as a rule of thumb, I claim that:
when the evolutionary learning algorithm adds a new feature to the brain algorithm, it does so by making more different idiosyncratic neuron types and synapse types and neuropeptide receptors and so on,
when one of the brain’s within-lifetime learning algorithm adds a new bit of learned content to its trained model, it does so by editing synapses.
Again, I only claim that these are rules-of-thumb, not hard-and-fast rules, but I do think they’re great starting points. Even if there’s a nonzero amount of learned content storage via gene expression, I propose that thinking of it as “changing the neuron type” is not a good way to think about it; it’s still “the same kind of neuron”, and part of the same subproject of the “understanding the brain” megaproject, it’s just that the neuron happens to be storing some adjustable parameter in its nucleus and acting differently in accordance with that.
By contrast, medium spiny neurons versus Purkinje cells versus cortical pyramidal neurons versus magnocellular neurosecretory cells etc. etc. are all just wildly different from each other—they look different, they act different, they play profoundly different roles in the brain algorithm, etc. The genome clearly needs to be dedicating some of its information capacity to specifying how to build each and every of those cell types, individually, such that each of them can play its own particular role in the brain algorithm.
Does that help explain where I’m coming from?
Thanks for the reply! I’m familiar with (and am skeptical of) the basic information theoretic argument as to why genome size should constrain the complexity of whatever algorithm the brain is running, but my question here is more specific. What I’m not clear on is how those two numbers (20,000 genes and a few thousand neuron types) specifically relate to each other in your model of brain functioning. Is the idea that each neuron type roughly corresponds to the expression of one or two specific genes, and thus you’d expect <20,000 neuron types?
“For sure, the genome could build a billion different “cell types” by each cell having 30 different flags which are on and off at random in a collection of 100 billion neurons. But … why on earth would the genome do that?”
Interestingly, the genome does do this! Protocadherins in vertebrates and DSCAM1 are expressed in exactly this way, and it’s thought to help neurons to distinguish themselves from other neurons, which is essential for neuronal self avoidance: https://en.wikipedia.org/wiki/Neuronal_self-avoidance#Molecular_basis_of_self-avoidance
Of course in an emulation you could probably just tell the neurons to not interact with themselves so this crazy system wouldn’t be necessary, but it is a nice example of how biology does things you might a priori think would never happen.
Start with 25,000 genes, but then reduce it a bunch because they also have to build hair follicles and the Golgi apparatus and on and on. But then increase it a bit too because each gene has more than one design degree of freedom, e.g. a protein can have multiple active sites, and there’s some ability to tweak which molecules can and cannot reach those active sites and how fast etc. Stuff like that.
Putting those two factors together, I dunno, I figure it’s reasonable to guess that the genome can have a recipe for a low-thousands of distinct neuron types each with its own evolutionarily-designed properties and each playing a specific evolutionarily-designed role in the brain algorithm.
And that “low thousands” number is ballpark consistent with the slide-seq thing, and also ballpark consistent with what you get by counting the number of neuron types in a random hypothalamus nucleus and extrapolating. High hundreds, low thousands, I dunno, I’m treating it as a pretty rough estimate.
Hmm, I guess when I think about it, the slide-seq number and the extrapolation number are probably more informative than the genome number. Like, can I really rule out “tens of thousands” just based on the genome size? Umm, not with extreme confidence, I’d have to think about it. But the genome size is at least a good “sanity check” on the other two methods.
No, I wouldn’t necessarily expect something so 1-to-1. Just the general information theory argument. If you have N “design degree of freedom” and you’re trying to build >>N specific machines that each does a specific thing, then you get stuck on the issue of crosstalk.
For example, suppose that some SNP changes which molecules can get to the active site of some protein. It makes Purkinje cells more active, but also increases the ratio of striatal matrix cells to striosomes, and also makes auditory cortex neurons more sensitive to oxytocin. Now suppose there’s very strong evolutionary pressure for Purkinje cells to be more active. Then maybe that SNP is going to spread through the population. But it’s going to have detrimental side-effects on the striatum and auditory cortex. Ah, but that’s OK, because there’s a different mutation to a different gene which fixes the now-suboptimal striatum, and yet a third mutation that fixes the auditory cortex. Oops, but those two mutations have yet other side-effects on the medulla and … Etc. etc.
…Anyway, if that’s what’s going on, that can be fine! Evolution can sort out this whole system over time, even with crazy side-effects everywhere. But only as long as there are enough “design degrees of freedom” to actually fix all these problems simultaneously. There do have to be more “design degrees of freedom” in the biology / genome than there are constraints / features in the engineering specification, if you want to build a machine that actually works. There doesn’t have to be a 1-to-1 match between design-degrees-of-freedom and items on your engineering blueprint, but you do need that inequality to hold. See what I mean?
Cool example, thanks! Yeah, that last part is what I would have said. :)
Interesting...I think I vaguely understand what you’re talking about, but I’m doubtful that these concepts really apply to biology. Especially since your example is about constraints on evolvability rather than functioning. In practice that is pretty much how everything tends to work, with absolutely wild amounts of pleiotropy and epistasis, but that’s not a problem unless you want to evolve a new function. Which is probably why the strong strong evolutionary default is towards stasis, not change.
I guess my priors are pretty different because my background is in virology, where our expectation (after decades of painful lessons) is that the default is for proteins to be wildly multifunctional, with many many many “design degrees of freedom.” Granted viruses are a bit of a special case, but I do think they can provide a helpful stress test/simpler model for information theoretic models of genome function.