The MoE architecture doesn’t just avoid thrashing weights around. It also reduces the amount of calculations per token. For instance, DeepSeek v3.1 has 671B parameters, out of which 37B are activated per token and used in matrices. A model like GPT-3 would use all the 175B parameters it has.
IIRC the human brain makes 1E14 -- 1E15 FLOP/second. The authors of the AI-2027 forecast imply that a human brain creates ~10 tokens/sec, or uses 1E13 -- 1E14 computations per token while having 1E14 synapses.
A more detailed analysis of Yudkowsky’s case for FOOM
If the brain was magically accelerated a million times so that signals reached the speed of 100 million m/s, then the brain would do 1E20 -- 1E21 FLOP/second while doing 1E17 transitions/sec. Cannell’s case for brain efficiency claims that the fundamental baseline irreversible (nano) wire energy is: ~1 Eb/bit/nm, with Eb in the range of 0.1eV (low reliability) to 1eV (high reliability). If reliability is low and each transition is 1E7 nanometers or 1 centimeter, then we need 1E23 EV/second or 1E4 joules/second. IMO this implies that Yudkowsky’s case for a human brain accelerated a million times is as unreliable as Cotra’s case against AI arriving quickly. However, proving that AI is an existential threat is far easier since it requires us to construct an architecture, not to prove that there’s none.
Returning to the human brain being far more powerful or efficient, we notice that it can’t, say, be copied infinitely many times. If it could, one could, say, upload a genius physicist and have an army of its copies work on different projects and exchange insights.
As for the humans being “wildly more data efficient”, Cannell’s post implies that AlphaGo disproves this conjecture with regards to narrow domains like games. What the humans are wildly more efficient is their ability to handle big contexts and to keep the information in mind for more than a single forward pass, as I discussed here and in the collapsible section here.
Yeah, sorry. I should’ve been more clear. I totally agree that there are ways in which brains are super inefficient and weak. I also agree that on restricted domains it’s possible for current AIs to sometimes reach comparable data efficiency.
The MoE architecture doesn’t just avoid thrashing weights around. It also reduces the amount of calculations per token. For instance, DeepSeek v3.1 has 671B parameters, out of which 37B are activated per token and used in matrices. A model like GPT-3 would use all the 175B parameters it has.
IIRC the human brain makes 1E14 -- 1E15 FLOP/second. The authors of the AI-2027 forecast imply that a human brain creates ~10 tokens/sec, or uses 1E13 -- 1E14 computations per token while having 1E14 synapses.
A more detailed analysis of Yudkowsky’s case for FOOM
If the brain was magically accelerated a million times so that signals reached the speed of 100 million m/s, then the brain would do 1E20 -- 1E21 FLOP/second while doing 1E17 transitions/sec. Cannell’s case for brain efficiency claims that the
fundamentalbaseline irreversible (nano) wire energy is: ~1 Eb/bit/nm, with Eb in the range of 0.1eV (low reliability) to 1eV (high reliability). If reliability is low and each transition is 1E7 nanometers or 1 centimeter, then we need 1E23 EV/second or 1E4 joules/second. IMO this implies that Yudkowsky’s case for a human brain accelerated a million times is as unreliable as Cotra’s case against AI arriving quickly. However, proving that AI is an existential threat is far easier since it requires us to construct an architecture, not to prove that there’s none.Returning to the human brain being far more powerful or efficient, we notice that it can’t, say, be copied infinitely many times. If it could, one could, say, upload a genius physicist and have an army of its copies work on different projects and exchange insights.
As for the humans being “wildly more data efficient”, Cannell’s post implies that AlphaGo disproves this conjecture with regards to narrow domains like games. What the humans are wildly more efficient is their ability to handle big contexts and to keep the information in mind for more than a single forward pass, as I discussed here and in the collapsible section here.
Yeah, sorry. I should’ve been more clear. I totally agree that there are ways in which brains are super inefficient and weak. I also agree that on restricted domains it’s possible for current AIs to sometimes reach comparable data efficiency.