I had a thought. When comparing parameter counts of LLMs to synapse counts, for parity the parameter count of each attention head should be multiplied by the number of locations that it can attend to, or at least its logarithm. That would account for about an order of magnitude of the disparity. So make that 2-3 orders of magnitude. That sounds rather more plausible for sparks of AGI to full AGI.
I had a thought. When comparing parameter counts of LLMs to synapse counts, for parity the parameter count of each attention head should be multiplied by the number of locations that it can attend to, or at least its logarithm. That would account for about an order of magnitude of the disparity. So make that 2-3 orders of magnitude. That sounds rather more plausible for sparks of AGI to full AGI.