RogerDearnaley comments on Labs should be explicit about why they are building AGI

RogerDearnaley 25 Dec 2023 0:11 UTC
1 point
0
Nobody knows what amount of compute is sufficient for AGI, in the sense of capability for mostly autonomous research, especially with some algorithmic improvements.
This is what I find really puzzling. The human brain, which only crossed the sapience threshold a quarter-million-years of evolution ago, has $O (10^{14})$ synapses, and a presumably a lot of evolved genetically-determined inductive biases. Synapses have very sparse connectivity, so synapse counts should presumably be compared to parameter counts after sparsification, which tends to reduce them by 1-2 orders of magnitude. GPT-4 is believed to have $O (10^{12})$ parameters: it’s an MoE model so has some sparsity and some duplication, so call that $O (10^{10} o r 10^{11})$ for a comparable number. So GPT-4 is showing “sparks of AGI” something like 3 or 4 orders of magnitude before we would expect AGI from a biological parallel. I find that astonishingly low. Bear in mind also that a human brain only needs to implement one human mind, whereas an LLM is trying to learn to simulate every human who’s ever written material on the Internet in any high/medium-resource language, a clearly harder problem.
I don’t know if this is evidence that AGI is a lot easier than humans make it look, or a lot harder than GPT-4 makes it look? Maybe controlling a real human body is an incredibly compute-intensive task (but then I’m pretty sure that < 90% of the human brain’s synapses are devoted to motor control and controlling the internal organs, and more than 10% are used for language/visual processing, reasoning, memory, and executive function). Possibly we’re mostly still fine-tuned for something other than being an AGI? Given the implications for timelines, I’d really like to know.
- Noosphere89 28 Oct 2024 15:42 UTC
  2 points
  0
  Parent
  I broadly suspect that this is the actual answer:
  
  Maybe controlling a real human body is an incredibly compute-intensive task
  
  More specifically, the reason here is latency requirements are on the order of milliseconds, which is also a hard constraint, which means you need more compute specifically for motor processing.
- RogerDearnaley 25 Dec 2023 0:26 UTC
  1 point
  0
  Parent
  I had a thought. When comparing parameter counts of LLMs to synapse counts, for parity the parameter count of each attention head should be multiplied by the number of locations that it can attend to, or at least its logarithm. That would account for about an order of magnitude of the disparity. So make that 2-3 orders of magnitude. That sounds rather more plausible for sparks of AGI to full AGI.