I am a bit surprised that you found this post so novel. How is this different from what MIRI etc has been saying for ages? Specifically have you read these posts and corresponding discussion? Brain efficiency,DoomPart1,Part2
I came away from this mostly agreeing with jacob_cannell, though there wasn’t consensus.
For this OP I also agree with the main point about transformers not scaling to AGI and believing the brain architecture is clearly better, however not to the point in the OP. I was going to write something up, but that would take some time and the discussion would have moved on. Much of that was the result of a conversation with OpenAI o3 and I was going to spend time checking all its working. Anyway here are some of the highlights (sounds plausible, but haven’t checked) I can give more of the transcript of people think it worthwhile.
FLOPS vs TEPS (Traversed Edges Per Second) or something similar
The major point here is that not all FLOPS are equal and perhaps that is not even the right measure. Something that combines FLOPS and bandwidth is probably a better measure. Biological computing is comparatively better at TEPS vs FLOPS, yet FLOPS is used. O3 claims you would need 5,000 modern GPU to match the TEPS of the human brain.
It also claims that a 1 million GPU datacenter could only simulate a brain with about 50* the synapses of the human brain
------------------------------------------------ Example points from O3 ------------------------------------------------
TEPS
Measuring compute by memory-bound metrics (like Traversed Edges Per Second – TEPS) gives a very different view than FLOPS — and in fact reflects the real bottleneck in most modern workloads, including:
Graph processing
Transformer attention
Sparse matrix ops
Many real-world ML inference tasks
🧠 What is TEPS?
...
TEPS is especially relevant in:
Graph analytics (e.g., BFS)
Sparse ML ops (e.g., GNNs)
Pointer chasing
Large transformer inference (token routing, KV lookup)
🔍 Why TEPS ≠ FLOPS
FLOPS-heavy
TEPS-heavy
Dense matrix ops
Sparse access (graphs, tables)
Fixed compute pattern
Irregular memory access
Compute-bound
Memory/IO-bound
In fact, many models today (e.g., MoEs, GNNs, search) are limited by TEPS, not FLOPS.
🚧 Bottlenecks for TEPS
External memory bandwidth (DRAM, HBM)
Current limit: ~3–5 TB/s (HBM3e on H100, MI300X)
Latency to DRAM: 200+ cycles
Cache + memory hierarchy
Random access can’t benefit from prefetching
Poor cache reuse kills perf
On-chip interconnect
Even if memory is fast, routing across cores is often slow
PCIe / NVLink limits
TEPS across GPUs/nodes is bottlenecked by I/O fabric
🔋 Scope for TEPS improvement (2025–2030)
Tech
Estimated Gain
Notes
HBM3 → HBM4
×1.5
6–9 TB/s bandwidth per GPU
3D stacked SRAM/cache
×1.5–2
Cache-on-cache for low-latency reuse
Better memory controllers
×1.2–1.5
Less thrashing on sparse access
Chiplet interconnects
×2+
NVIDIA NVLink Switch, AMD Infinity Fabric
In-memory compute (R&D)
×5–10 (future)
Still experimental, not mainstream by 2030
On-device compression
×2 effective BW
Especially for attention/key-value cache
✅ Realistic projection
By 2030, memory-bound systems may gain:
×3–5 TEPS improvement (vs today)
More in IO + cache than DRAM latency
However, gains are not exponential like FLOPS used to be, and most advances depend on:
Packaging (chiplets, 3D stacking)
Smarter scheduling & software
Tighter memory + compute coupling
📦 Real Hardware Benchmarks
System
Benchmark (Graph500)
TEPS (approx.)
NVIDIA H100
BFS-style graph task
~10–20 billion TEPS
Intel CPUs
Graph500
~1–2 billion TEPS
Cerebras (sparse)
SpMV
~100B+ TEPS (claimed in special cases)
HPC clusters
Multi-node BFS
1–10+ trillion TEPS
------------------------------------------------
How many GPU to equal TEPS of the brain?
🧠 Human brain TEPS estimate
One H100 can do ~200 billion TEPS (if memory-bound)
3 · How big a brain-like network could 1 million GPUs simulate?
Quantity
Per H100 (80 GB)
1 M GPUs
Notes
RAM for synapses
80 GB → ~40 B 1-byte synapses
4×10¹⁶ synapses
40 × human cortex
On-device TEPS
2×10¹¹
2×10¹⁷
Linear if all work is local
Inter-GPU BW
0.4 TB s⁻¹ NVLink equiv.
4×10⁵ TB s⁻¹ (aggregate)
Effective TEPS scales sub-linearly ↓
Result: the 1 M-GPU datacentre could host ≈ 4 × 10¹⁶ synapses (40× brain) but delivers ∼5 × 10¹⁶ effective TEPS — only 50× brain, not 1 000×, because the network flattens scaling.
I am a bit surprised that you found this post so novel.
I was too! Many of the points were indeed old.
How is this different from what MIRI etc has been saying for ages?
Recently even MIRI seems to have had the position that LLMs might bring us to AGI and they seem to have been concerned about LLM scaling. E.g. Eliezer’s TIME letter; or Joe Rogero saying to me that:
Anthropic is indeed trying. Unfortunately, they are not succeeding, and they don’t appear to be on track to notice this fact and actually stop.
If Anthropic does not keep up with the reckless scaling of e.g. OpenAI, they will likely cease to attract investment and wither on the vine. But aligning superintelligence is harder than building it. A handful of alignment researchers working alongside capabilities folks aren’t going to cut it. Anthropic cannot afford to delay scaling; even if their alignment researchers advised against training the next model, Anthropic could not afford to heed them for long.
This sounds to me like it’s assuming that if you keep scaling LLMs then you’ll eventually get to superintelligence. So I thought something like “hmm MIRI seems to assume that we’ll go from LLMs to superintelligence but LLMs seem much easier to align than the AIs in MIRI’s classic scenarios and also work to scale them will probably slow down eventually so that will also give us more time”. There’s also been a lot of discussion focused on things like AI 2027 that also assume this. And then when everyone was pointing so intensely at doom-from-LLMs scenarios, it felt easy to only let my attention go to those and then I forgot to think about the case of non-LLM AGI.
If I had, I didn’t remember much of them. Skimming them through, I think the specific position they’re criticizing doesn’t feel very cruxy to me. (Or rather, if Eliezer was right, then that would certainly be a compelling route for AI doom; but there are many ways by which AIs can become more capable than humans, and “having hardware that’s more efficient than the human brain” is only one of them. Computers are already superhuman in a lot of different domains without needing to have a greater hardware efficiency for that.)
But… the success of LLMs is the only reason people have super short timelines! That’s why we’re all worried about them, and in particular if they can soon invent a better paradigm—which, yes, may be more efficient and dangerous than LLMs, but presumably requires them to pass human researcher level FIRST, maybe signficantly.
If you don’t believe LLMs will scale to AGI, I see no compelling reason to expect another paradigm which is much better to be discovered in the next 5 or 10 years. Neuroscience is a pretty old field! They haven’t figured out rhe brain’s core algorithm for intelligence yet, if that’s even a thing. Just because LLMs displayed some intelligent behavior before fizzling (in this hypothetical) doesn’t mean that we’re necessarily one simple insight away. So that’s a big sigh of relief, actually.
I see no compelling reason to expect another paradigm which is much better to be discovered in the next 5 or 10 years.
One compelling reason to expect the next 5 to 10 years independent of LLMs is that compute has just recently gotten cheap enough that you can relatively cheaply afford to do training runs that use as much compute as humans use (roughly speaking) in a lifetime. Right now, doing 3e23 FLOP (perhaps roughly human lifetime FLOP) costs roughly $200k and we should expect that in 5 years it only costs around $30k.
So if you thought we might achieve AGI around the point when compute gets cheap enough to do lots of experiments with around human level compute and training runs of substantially larger scale, this is now achievable. To put this another way, most of the probability mass of the “lifetime anchor” from the bio anchors report rests in the next 10 years.
More generally, we’ll be scaling through a large number of orders of magnitude of compute (including spent on things other than LLMs potentially) and investing much more in AI research.
I don’t think these reasons on their own should get you above ~25% within the next 10 years, but this in combination with LLMs feels substantial to me (especially because a new paradigm could build on LLMs even if LLMs don’t suffice).
Presumably you should put some weight on both perspectives, though I put less weight on needing as much compute as evolution because evolution seems insanely inefficient.
That’s why I specified “close on a log scale.” Evolution may be very inefficient, but it also has access to MUCH more data than a single lifetime.
Yes, we should put some weight on both perspectives. What I’m worried about here is this trend where everyone seems to expect AGI in a decade or so even if the current wave of progress fizzles—I think that is a cached belief. We should be prepared to update.
I don’t expect AGI in a decade or so even if the current wave of progress fizzles. I’d put around 20% over the next decade if progress fizzles (it depends on the nature of the fizzle), which is what I was arguing for.
I’m saying we should put some weight on possibilities near lifetime level compute (in log space) and some weight on possibilities near evolution level compute (in log space).
I suspect this is why many people’s P(Doom) is still under 50% - not so much that ASI probably won’t destroy us, but simply that we won’t get to ASI at all any time soon. Although I’ve seen P(Doom) given a standard time range of the next 100 years, which is a rather long time! But I still suspect some are thinking directly about the recent future and LLMs without extrapolating too much beyond that.
This sounds to me like it’s assuming that if you keep scaling LLMs then you’ll eventually get to superintelligence. So I thought something like “hmm MIRI seems to assume that we’ll go from LLMs to superintelligence but LLMs seem much easier to align than the AIs in MIRI’s classic scenarios and also work to scale them will probably slow down eventually so that will also give us more time.
Yes I can see that is a downside, if LLM can’t scale enough to speed up alignment research and are not the path to AGI then having them aligned doesn’t really help.
My takeaway from Jacobs work and my beliefs is that you can’t separate hardware and computational topology from capabilities. That is if you want a system to understand and manipulate a 3d world the way humans and other smart animals do, then you need a large number of synapses, specifically in something like a scale free network like design. That means its not just bandwidth or TEPS, but also many long distance connections with only a small number of hops needed between any given neurons. Our current HW is not setup to simulate this very well, and a single GPU while having high FLOPS can’t get anywhere near high enough on this measure to match a human brain. Additionally you need a certain network size before the better architecture even gives an advantage. Transformers don’t beat CNN on vision tasks until the task reaches a certain difficulty. These combined lead me to believe that someone with just a GPU or two won’t do anything dangerous with a new paradigm.
Based on this, the observation that computers are already superhuman in some domains isn’t necessarily a sign of danger—the network required to play Go simply doesn’t need the large connected architecture because the domain, i.e. a small discrete 2d board doesn’t require it.
I agree that there is danger, and a crux to me is how much better can a ANN be at say science than a biological one given that we have not evolved to do abstract symbol manipulation. One one hand there are brilliant mathematicians that can outcompete everyone else, however the same does not apply to biology. Some stuff requires calculation and real world experimentation and intelligence can’t shortcut it.
If some problems require computation with specific topology/hardware then a GPU setup cant just reconfigure itself and FOOM.
I am a bit surprised that you found this post so novel. How is this different from what MIRI etc has been saying for ages?
Specifically have you read these posts and corresponding discussion?
Brain efficiency, DoomPart1, Part2
I came away from this mostly agreeing with jacob_cannell, though there wasn’t consensus.
For this OP I also agree with the main point about transformers not scaling to AGI and believing the brain architecture is clearly better, however not to the point in the OP. I was going to write something up, but that would take some time and the discussion would have moved on. Much of that was the result of a conversation with OpenAI o3 and I was going to spend time checking all its working. Anyway here are some of the highlights (sounds plausible, but haven’t checked) I can give more of the transcript of people think it worthwhile.
FLOPS vs TEPS (Traversed Edges Per Second) or something similar
The major point here is that not all FLOPS are equal and perhaps that is not even the right measure. Something that combines FLOPS and bandwidth is probably a better measure. Biological computing is comparatively better at TEPS vs FLOPS, yet FLOPS is used. O3 claims you would need 5,000 modern GPU to match the TEPS of the human brain.
It also claims that a 1 million GPU datacenter could only simulate a brain with about 50* the synapses of the human brain
------------------------------------------------
Example points from O3
------------------------------------------------
TEPS
Measuring compute by memory-bound metrics (like Traversed Edges Per Second – TEPS) gives a very different view than FLOPS — and in fact reflects the real bottleneck in most modern workloads, including:
Graph processing
Transformer attention
Sparse matrix ops
Many real-world ML inference tasks
🧠 What is TEPS?
...
TEPS is especially relevant in:
Graph analytics (e.g., BFS)
Sparse ML ops (e.g., GNNs)
Pointer chasing
Large transformer inference (token routing, KV lookup)
🔍 Why TEPS ≠ FLOPS
In fact, many models today (e.g., MoEs, GNNs, search) are limited by TEPS, not FLOPS.
🚧 Bottlenecks for TEPS
External memory bandwidth (DRAM, HBM)
Current limit: ~3–5 TB/s (HBM3e on H100, MI300X)
Latency to DRAM: 200+ cycles
Cache + memory hierarchy
Random access can’t benefit from prefetching
Poor cache reuse kills perf
On-chip interconnect
Even if memory is fast, routing across cores is often slow
PCIe / NVLink limits
TEPS across GPUs/nodes is bottlenecked by I/O fabric
🔋 Scope for TEPS improvement (2025–2030)
✅ Realistic projection
However, gains are not exponential like FLOPS used to be, and most advances depend on:
Packaging (chiplets, 3D stacking)
Smarter scheduling & software
Tighter memory + compute coupling
📦 Real Hardware Benchmarks
------------------------------------------------
How many GPU to equal TEPS of the brain?
🧠 Human brain TEPS estimate
One H100 can do ~200 billion TEPS (if memory-bound)
3 · How big a brain-like network could 1 million GPUs simulate?
Result: the 1 M-GPU datacentre could host ≈ 4 × 10¹⁶ synapses (40× brain) but delivers ∼5 × 10¹⁶ effective TEPS — only 50× brain, not 1 000×, because the network flattens scaling.
I was too! Many of the points were indeed old.
Recently even MIRI seems to have had the position that LLMs might bring us to AGI and they seem to have been concerned about LLM scaling. E.g. Eliezer’s TIME letter; or Joe Rogero saying to me that:
This sounds to me like it’s assuming that if you keep scaling LLMs then you’ll eventually get to superintelligence. So I thought something like “hmm MIRI seems to assume that we’ll go from LLMs to superintelligence but LLMs seem much easier to align than the AIs in MIRI’s classic scenarios and also work to scale them will probably slow down eventually so that will also give us more time”. There’s also been a lot of discussion focused on things like AI 2027 that also assume this. And then when everyone was pointing so intensely at doom-from-LLMs scenarios, it felt easy to only let my attention go to those and then I forgot to think about the case of non-LLM AGI.
If I had, I didn’t remember much of them. Skimming them through, I think the specific position they’re criticizing doesn’t feel very cruxy to me. (Or rather, if Eliezer was right, then that would certainly be a compelling route for AI doom; but there are many ways by which AIs can become more capable than humans, and “having hardware that’s more efficient than the human brain” is only one of them. Computers are already superhuman in a lot of different domains without needing to have a greater hardware efficiency for that.)
But… the success of LLMs is the only reason people have super short timelines! That’s why we’re all worried about them, and in particular if they can soon invent a better paradigm—which, yes, may be more efficient and dangerous than LLMs, but presumably requires them to pass human researcher level FIRST, maybe signficantly.
If you don’t believe LLMs will scale to AGI, I see no compelling reason to expect another paradigm which is much better to be discovered in the next 5 or 10 years. Neuroscience is a pretty old field! They haven’t figured out rhe brain’s core algorithm for intelligence yet, if that’s even a thing. Just because LLMs displayed some intelligent behavior before fizzling (in this hypothetical) doesn’t mean that we’re necessarily one simple insight away. So that’s a big sigh of relief, actually.
One compelling reason to expect the next 5 to 10 years independent of LLMs is that compute has just recently gotten cheap enough that you can relatively cheaply afford to do training runs that use as much compute as humans use (roughly speaking) in a lifetime. Right now, doing 3e23 FLOP (perhaps roughly human lifetime FLOP) costs roughly $200k and we should expect that in 5 years it only costs around $30k.
So if you thought we might achieve AGI around the point when compute gets cheap enough to do lots of experiments with around human level compute and training runs of substantially larger scale, this is now achievable. To put this another way, most of the probability mass of the “lifetime anchor” from the bio anchors report rests in the next 10 years.
More generally, we’ll be scaling through a large number of orders of magnitude of compute (including spent on things other than LLMs potentially) and investing much more in AI research.
I don’t think these reasons on their own should get you above ~25% within the next 10 years, but this in combination with LLMs feels substantial to me (especially because a new paradigm could build on LLMs even if LLMs don’t suffice).
Seems plausible, but not compelling.
Why one human lifetime and not somewhere closer to evolutionary time on log scale?
Presumably you should put some weight on both perspectives, though I put less weight on needing as much compute as evolution because evolution seems insanely inefficient.
That’s why I specified “close on a log scale.” Evolution may be very inefficient, but it also has access to MUCH more data than a single lifetime.
Yes, we should put some weight on both perspectives. What I’m worried about here is this trend where everyone seems to expect AGI in a decade or so even if the current wave of progress fizzles—I think that is a cached belief. We should be prepared to update.
I don’t expect AGI in a decade or so even if the current wave of progress fizzles. I’d put around 20% over the next decade if progress fizzles (it depends on the nature of the fizzle), which is what I was arguing for.
I’m saying we should put some weight on possibilities near lifetime level compute (in log space) and some weight on possibilities near evolution level compute (in log space).
I’m not sure we disagree then.
I suspect this is why many people’s P(Doom) is still under 50% - not so much that ASI probably won’t destroy us, but simply that we won’t get to ASI at all any time soon. Although I’ve seen P(Doom) given a standard time range of the next 100 years, which is a rather long time! But I still suspect some are thinking directly about the recent future and LLMs without extrapolating too much beyond that.
Yes I can see that is a downside, if LLM can’t scale enough to speed up alignment research and are not the path to AGI then having them aligned doesn’t really help.
My takeaway from Jacobs work and my beliefs is that you can’t separate hardware and computational topology from capabilities. That is if you want a system to understand and manipulate a 3d world the way humans and other smart animals do, then you need a large number of synapses, specifically in something like a scale free network like design. That means its not just bandwidth or TEPS, but also many long distance connections with only a small number of hops needed between any given neurons. Our current HW is not setup to simulate this very well, and a single GPU while having high FLOPS can’t get anywhere near high enough on this measure to match a human brain. Additionally you need a certain network size before the better architecture even gives an advantage. Transformers don’t beat CNN on vision tasks until the task reaches a certain difficulty. These combined lead me to believe that someone with just a GPU or two won’t do anything dangerous with a new paradigm.
Based on this, the observation that computers are already superhuman in some domains isn’t necessarily a sign of danger—the network required to play Go simply doesn’t need the large connected architecture because the domain, i.e. a small discrete 2d board doesn’t require it.
I agree that there is danger, and a crux to me is how much better can a ANN be at say science than a biological one given that we have not evolved to do abstract symbol manipulation. One one hand there are brilliant mathematicians that can outcompete everyone else, however the same does not apply to biology. Some stuff requires calculation and real world experimentation and intelligence can’t shortcut it.
If some problems require computation with specific topology/hardware then a GPU setup cant just reconfigure itself and FOOM.