I’m having trouble parsing what you’ve said here in a way that makes sense to me. Let me try to lay out my understanding of the facts very explicitly, and you can chime in with disagreements / corrections / clarifications:
The human brain has, very roughly, 100B neurons (nodes) and 100T synapses (connections). Each synapse represents at least one “parameter”, because connections can have different strengths. I believe there are arguments that it would in fact take multiple parameters to characterize a synapse (connection strength + recovery time + sensitivity to various neurotransmitters + ???), and I’m sympathetic to this idea on the grounds that everything in the body turns out to be more complicated than you think, but I don’t know much about it.
Regarding GPT-4, I believe the estimate was that it has 1.8 trillion parameters, which if shared weights are used may not precisely correspond to connections or FLOPs. For purposes of information storage (“learning”) capacity, parameter count seems like the correct metric to focus on? (In the post, I equated parameters with connections, which is incorrect in the face of shared weights, but does not detract from the main point, unless you disagree with my claim that parameter count is the relevant metric here.)
To your specific points:
Probably the effective number of parameters in the human brain is actually lower than 100 trillion because many of these “parameters” are basically randomly initialized or mostly untrained. (Or are trained very slowly/weakly.) The brain can’t use a global learning algorithm, so it might effectively use parameters much less efficiently.
What is your basis for this intuition? LLM parameters are randomly initialized. Synapses might start with better-than-random starting values, I have no idea, but presumably not worse than random. LLMs and brains both then undergo a training process; what makes you think that the brain is likely to do the worse job of training its available weights, or that many synapses are “mostly untrained”?
Also note that the brain has substantial sources of additional parameters that we haven’t accounted for yet: deciding which synapses to prune (out of the much larger early-childhood count), which connections to form in the first place (the connection structure of an LLM can be described in a relative handful of bits, while the connection structure of the brain has an enormous number of free parameters; I don’t know how “valuable” those parameters are, but natural systems are clever!), where to add additional connections later in life.
It’s a bit confusing to describe GPT-4 as having 1.8 trillion connections as 1.8 trillion is the number of floating point operations (roughly) not the number of neurons.
I never mentioned neurons. 1.8 trillion is, I believe, the best estimate for GPT-4′s parameter count. Certainly we know that the largest open-weight models have parameter counts of this order of magnitude (somewhat smaller but not an OOM smaller). As noted, I forgot about shared weights when equating parameters to connections, but again I don’t think that matters here. FLOPs to my understanding would correspond to connections (and not parameter counts, if shared weights are used), but I don’t think FLOPs are relevant here either.
In general, the analogy between the human brain and LLMs is messy because a single neuron probably has far fewer learned parameters than a LLM neuron, but plausibly somewhat more than a single floating point number.
GPT-5 estimates that GPT-4 had just O(100M) neurons. Take that figure with a grain of salt, but I mention it to point out that in both modern LLMs and the human brain, there are far more connections / synapses than nodes / neurons, and the vast majority of parameters will be associated with connections, not nodes. (Which is why I didn’t mention neurons in the post, and I don’t think it’s useful to talk about learned parameters in reference to neurons.)
Regarding GPT-4, I believe the estimate was that it has 1.8 trillion parameters, which if shared weights are used may not precisely correspond to connections or FLOPs.
For standard LLM architectures, forward pass FLOPs are ≈2⋅parameters (because of the multiply and accumulate for each matmul param). It could be that GPT-4 has some non-standard architecture where this is false, but I doubt it.
So, yeah we agree here, I was just noting that connection == FLOP (roughly).
What is your basis for this intuition? [...] what makes you think that the brain is likely to do the worse job of training its available weights, or that many synapses are “mostly untrained”?
The brain is purely local which makes training all the parameters efficiently much harder, my understanding is that in at least the vision focused part of the brain there is a bunch of use of randomly initialized filters, and I seem to recall some argument made somewhere (by Steven Byrnes?) that the effective number of parameters was much lower. Sorry I can’t say more here.
I’m having trouble parsing what you’ve said here in a way that makes sense to me. Let me try to lay out my understanding of the facts very explicitly, and you can chime in with disagreements / corrections / clarifications:
The human brain has, very roughly, 100B neurons (nodes) and 100T synapses (connections). Each synapse represents at least one “parameter”, because connections can have different strengths. I believe there are arguments that it would in fact take multiple parameters to characterize a synapse (connection strength + recovery time + sensitivity to various neurotransmitters + ???), and I’m sympathetic to this idea on the grounds that everything in the body turns out to be more complicated than you think, but I don’t know much about it.
Regarding GPT-4, I believe the estimate was that it has 1.8 trillion parameters, which if shared weights are used may not precisely correspond to connections or FLOPs. For purposes of information storage (“learning”) capacity, parameter count seems like the correct metric to focus on? (In the post, I equated parameters with connections, which is incorrect in the face of shared weights, but does not detract from the main point, unless you disagree with my claim that parameter count is the relevant metric here.)
To your specific points:
What is your basis for this intuition? LLM parameters are randomly initialized. Synapses might start with better-than-random starting values, I have no idea, but presumably not worse than random. LLMs and brains both then undergo a training process; what makes you think that the brain is likely to do the worse job of training its available weights, or that many synapses are “mostly untrained”?
Also note that the brain has substantial sources of additional parameters that we haven’t accounted for yet: deciding which synapses to prune (out of the much larger early-childhood count), which connections to form in the first place (the connection structure of an LLM can be described in a relative handful of bits, while the connection structure of the brain has an enormous number of free parameters; I don’t know how “valuable” those parameters are, but natural systems are clever!), where to add additional connections later in life.
I never mentioned neurons. 1.8 trillion is, I believe, the best estimate for GPT-4′s parameter count. Certainly we know that the largest open-weight models have parameter counts of this order of magnitude (somewhat smaller but not an OOM smaller). As noted, I forgot about shared weights when equating parameters to connections, but again I don’t think that matters here. FLOPs to my understanding would correspond to connections (and not parameter counts, if shared weights are used), but I don’t think FLOPs are relevant here either.
GPT-5 estimates that GPT-4 had just O(100M) neurons. Take that figure with a grain of salt, but I mention it to point out that in both modern LLMs and the human brain, there are far more connections / synapses than nodes / neurons, and the vast majority of parameters will be associated with connections, not nodes. (Which is why I didn’t mention neurons in the post, and I don’t think it’s useful to talk about learned parameters in reference to neurons.)
For standard LLM architectures, forward pass FLOPs are ≈2⋅parameters (because of the multiply and accumulate for each matmul param). It could be that GPT-4 has some non-standard architecture where this is false, but I doubt it.
So, yeah we agree here, I was just noting that connection == FLOP (roughly).
The brain is purely local which makes training all the parameters efficiently much harder, my understanding is that in at least the vision focused part of the brain there is a bunch of use of randomly initialized filters, and I seem to recall some argument made somewhere (by Steven Byrnes?) that the effective number of parameters was much lower. Sorry I can’t say more here.