Brain Efficiency Cannell Prize Contest Award Ceremony
Previously Jacob Cannell wrote the post “Brain Efficiency” which makes several radical claims: that the brain is at the pareto frontier of speed, energy efficiency and memory bandwith, that this represent a fundamental physical frontier.
Here’s an AI-generated summary
The article “Brain Efficiency: Much More than You Wanted to Know” on LessWrong discusses the efficiency of physical learning machines. The article explains that there are several interconnected key measures of efficiency for physical learning machines: energy efficiency in ops/J, spatial efficiency in ops/mm^2 or ops/mm^3, speed efficiency in time/delay for key learned tasks, circuit/compute efficiency in size and steps for key low-level algorithmic tasks, and learning/data efficiency in samples/observations/bits required to achieve a level of circuit efficiency, or per unit thereof. The article also explains why brain efficiency matters a great deal for AGI timelines and takeoff speeds, as AGI is implicitly/explicitly defined in terms of brain parity. The article predicts that AGI will consume compute & data in predictable brain-like ways and suggests that AGI will be far more like human simulations/emulations than you’d otherwise expect and will require training/education/raising vaguely like humans1.
Considering the intense technical mastery of nanoelectronics, thermodynamics and neuroscience required to assess the arguments here I concluded that a public debate between experts was called for. This was the start of the Brain Efficiency Prize contest which attracted over a 100 in-depth technically informed comments.
Now for the winners! Please note that the criteria for winning the contest was based on bringing in novel and substantive technical arguments as assesed by me. In contrast, general arguments about the likelihood of FOOM or DOOM while no doubt interesting did not factor into the judgement.
And the winners of the Jake Cannell Brain Efficiency Prize contest are
… and Steven Byrnes!
Each has won $150, provided by Jake Cannell, Eli Tyre and myself.
I’d like to heartily congratulate the winners and thank everybody who engaged in the debate. The discussion were sometimes heated but always very informed. I was wowed and amazed by the extraordinary erudition and willingness for honest compassionate intellectual debate displayed by the winners.
So what are the takeaways?
I will let you be the judge. Again, remember the choice of the winners was made on my (layman) assesment that the participant brought in novel and substantive technical arguments and thereby furthered the debate.
The jury was particularly impressed by Byrnes’ patient, open-minded and erudite participation in the debate.
He has kindly written a post detailing his views. Here’s his summary
Some ways that Jacob & I seem to be talking past each other
I will, however, point to some things that seem to be contributing to Jacob & me talking past each other, in my opinion.
Jacob likes to talk about detailed properties of the electrons in a metal wire (specifically, their de Broglie wavelength, mean free path, etc.), and I think those things cannot possibly be relevant here. I claim that once you know the resistance/length, capacitance/length, and inductance/length of a wire, you know everything there is to know about that wire’s electrical properties. All other information is screened off. For example, a metal wire can have a certain resistance-per-length by having a large number of mobile electrons with low mobility, or it could have the same resistance-per-length by having a smaller number of mobile electrons with higher mobility. And nobody cares which one it is—it just doesn’t matter in electronics.
I want to talk about wire voltage profiles in terms of the “normal” wire / transmission line formulas (cf. telegrapher’s equations, characteristic impedance, etc.), and Jacob hasn’t been doing that AFAICT. I can derive all those wire-related formulas from first principles (ooh and check out my cool transmission line animations from my days as a wikipedia editor!), and I claim that those derivations are perfectly applicable in the context in question (nano-sized wire interconnects on chips), so I am pretty strongly averse to ignoring those formulas in favor of other things that don’t make sense to me.
Relatedly, I want to talk about voltage noise in terms of the “normal” electronics noise literature formulas, like Johnson noise, shot noise, crosstalk noise, etc., and Jacob hasn’t been doing that AFAICT. Again, I’m not taking these formulas on faith, I know their derivations from first principles, and I claim that they are applicable in the present context (nano-sized wire interconnects on chips) just like for any other wire. For example, the Johnson noise formula is actually the 1D version of Planck’s blackbody radiation equation—a deep and basic consequence of thermodynamics. Here I’m thinking here in particular of Jacob’s comment “it accumulates noise on the landauer scale at each nanoscale transmission step, and at the minimal landauer bit energy scale this noise rapidly collapses the bit representation (decays to noise) exponentially quickly”. I will remain highly skeptical of a claim like that unless I learn that it is derivable from the formulas for electrical noise on wires that I can find in the noise chapter of my electronics textbooks.
Jacob wants to describe wires as being made of small (≈1 nm) “tiles”, each of which is a different “bit”, with information flow down wires corresponding to dissipative bit-copying operations, and I reject that picture. For example, take a 100 μm long wire, on which signals propagate at a significant fraction of the speed of light. Now smoothly slew the voltage at one end of the wire from 0 to over the course of 0.1 ns. (In reality, the slew rate is indeed not infinite, but rather limited by transistor capacitance among other things.) Then, as you can check for yourself, the voltage across the entire wire will slew at the same rate at the same time. In other words, a movie of the voltage-vs-position curve on this 100 μm wire would look like a rising horizontal line, not a propagating wave. Now, recall where the Landauer limit comes from: bit-copy operations require kT of energy dissipation, because we go from four configurations (00,01,10,11) to two (00,11). The Second Law of Thermodynamics says we can’t reduce the number of microstates overall, so if the number of possible chip microstates goes down, we need to make up for it by increasing the temperature (and hence number of occupied microstates) elsewhere in the environment, i.e. we need to dissipate energy / dump heat. But in our hypothetical 100 μm long wire above, this analysis doesn’t apply! The different parts of the wire were never at different voltages in the first place, and therefore we never have to collapse more microstates into fewer.
…So anyway, I think our conversation had a bit of an unproductive dynamic where Jacob would explain why what I said cannot possibly be right [based on his “tiles” model], and then in turn I explain why what he said possibly be right [based on the formulas I like e.g. telegrapher’s equations], and then in turn Jacob would explain why that cannot possibly be right [based on his “tiles” model], and around and around we go.
For their extensive post on the time and energy cost to erase a bit. The jury was particularly impressed by the extensive calculations and experiments that were done.
the last of which is a detailed worked out example of a LDF5-50A Rigid Coax Cable that would violate Jacob’s postulated bound, see here:
Ethernet cables are twisted pair and will probably never be able to go that fast. You can get above 10 GHz with rigid coax cables, although you still have significant attenuation.
Let’s compute heat loss in a 100 m LDF5-50A, which evidently has 10.9 dB/100 m attenuation at 5 GHz. This is very low in my experience, but it’s what they claim.
Say we put 1 W of signal power at 5 GHz in one side. Because of the 10.9 dB attenuation, we receive 94 mW out the other side, with 906 mW lost to heat.
The Shannon-Hartley theorem says that we can compute the capacity of the wire as where is the bandwidth, is received signal power, and is noise power.
Let’s assume Johnson noise. These cables are rated up to 100 C, so I’ll use that temperature, although it doesn’t make a big difference.
If I plug in 5 GHz for , 94 mW for and for then I get a channel capacity of 160 GHz.
The heat lost is then Quite low compared to Jacob’s ~10 fJ/mm “theoretical lower bound.”
One free parameter is the signal power. The heat loss over the cable is linear in the signal power, while the channel capacity is sublinear, so lowering the signal power reduces the energy cost per bit. It is 10 fJ/bit/mm at about 300 W of input power, quite a lot!
Another is noise power. I assumed Johnson noise, which may be a reasonable assumption for an isolated coax cable, but not for an interconnect on a CPU. Adding an order of magnitude or two to the noise power does not substantially change the final energy cost per bit (0.05 goes to 0.07), however I doubt even that covers the amount of noise in a CPU interconnect.
Similarly, raising the cable attenuation to 50 dB/100 m does not even double the heat loss per bit. Shannon’s theorem still allows a significant capacity. It’s just a question of whether or not the receiver can read such small signals.
The reason that typical interconnects in CPUs and the like tend to be in the realm of 10-100 fJ/bit/mm is because of a wide range of engineering constraints, not because there is a theoretical minimum. Feel free to check my numbers of course. I did this pretty quickly.
For his continued insistence on making the cruxes explicit between Jake Cannell and his critics; honing in on key disagreements. Ege Erdil first suggested the idea of working out the example of the heat loss of a cable that would show a violation of Jake’s postulated boundes.
Maybe I’m interpreting these energies in a wrong way and we could violate Jacob’s postulated bounds by taking an Ethernet cable and transmitting 40 Gbps of information at a long distance, but I doubt that would actually work.
My own views
I personally moved from being very sympathetic to Jacob Cannell’s point of view to the point of giving a talk on his perspective to being much more skeptical. I was not aware to what degree his viewpoints differed from established physics academia. The tile Landauer model seems likely invalid—at the very least the claims that are made need to be much more explicitly detailed and made subject to peer review.
Still it seems many of considerations on the fundamental design trade-offs and limitations for brains and architecture are sensible and important. His ideas remain influential on how I think about the future of computer hardware, the brain and the form of superintelligent machines.
I’d like to once again thank all participants in this debate. It is my sincere belief that detailed, in-depth technical discussion is the way to move the debate forward and empower the public and decision-makers in making the right decisions.
I was heartened to see so many extremely informed people engage seriously and dispassionately. I hope that similar prize contests on topics of interest to the future of humanity may similarly spur debate. In particular, I hope to organize a prize contest on Yudkowsky’s views on nanotechnology. Let me know if you are interested in that happening.