To be clear, if I understand you correctly, the easier path to getting most of the 6 OOMs is through optical interconnect or superconducting interconnect, not via making the full jump to reversible computation (though that also doesn’t seem impossible. Moving all of it over seems hard, but you can maybe find some way to get a core computation like matrix multiplies into it, but I really haven’t thought much about this and this take might be really dumb).
I mean, the easiest solution is just “make it smaller and use active cooling”. The relevant loopholes in Jacob’s argument are in the Density and Temperature section of his Brain Efficiency post.
Jacob is using a temperature formula for blackbody radiators, which is basically just irrelevant to temperature of realistic compute substrate—brains, chips, and probably future compute substrates are all cooled by conduction through direct contact with something cooler (blood for the brain, heatsink/air for a chip). The obvious law to use instead would just be the standard thermal conduction law: heat flow per unit area proportional to temperature gradient.
Jacob’s analysis in that section also fails to adjust for how, by his own model in the previous section, power consumption scales linearly with system size (and also scales linearly with temperature).
Put all that together, and we get:
qA=C1TSRR2=C2(TS−TE)R
… where:
R is radius of the system
A is surface area of thermal contact
q is heat flow out of system
TS is system temperature
TE is environment temperature (e.g. blood or heat sink temperature)
C1,C2 are constants with respect to system size and temperature
(Of course a spherical approximation is not great, but we’re mostly interested in change as all the dimensions scale linearly, so the geometry shouldn’t matter for our purposes.)
First key observation: all the R’s cancel out. If we scale down by a factor of 2, the power consumption is halved (since every wire is half as long), the area is quartered (so power density over the surface is doubled), and the temperature gradient is doubled since the surface is half as thick. So, overall, equilibrium temperature stays the same as the system scales down.
So in fact scaling down is plausibly free, for purposes of heat management. (Though I’m not highly confident that would work in practice. In particular, I’m least confident about the temperature gradient scaling with system size, in practice. If that failed, then the temperature delta relative to the environment would scale at-worst ~linearly with inverse size, i.e. halving the size would double the temperature delta.)
On top of that, we could of course just use a colder environment, i.e. pump liquid nitrogen or even liquid helium over the thing. According to this meta-analysis, the average temperature delta between e.g. brain and blood is at most ~2.5 C, so even liquid nitrogen would be enough to achieve ~100x larger temperature delta if the system were at the same temperature as the brain; we don’t even need to go to liquid helium for that.
In terms of scaling, our above formula says that TS will scale proportionally to TE. Halve the environment temperature, halve the system temperature. And that result I do expect to be pretty robust (for systems near Jacob’s interconnect Landauer limit), since it just relies on temperature scaling of the Landauer limit plus heat flow being proportional to temperature delta.
I mean, the easiest solution is just “make it smaller and use active cooling”.
The brain already uses active liquid cooling of course, so this is just make it smaller and cool it harder.
I have not had time to investigate your claimed physics on how cooling scales, but I”m skeptical—pumping a working coolant through the compute volume can only extract a limited constant amount of heat from the volume per unit of coolant flowing per time step (this should be obvious?), and thus the amount of heat that can be removed must scale strictly with the surface area (assuming that you’ve already maxed out the cooling effect per unit coolant).
So reduce radius by 2x and you reduce surface area and thus heat pumped out by 4x, but only reduce heat production via reducing wire length by at most 2x as I described in the article.
Active cooling ends up using more energy as you are probably aware. Moving to a colder environment is of course feasible (and used to some extent by some datacenters), but that hardly gets OOM gains on earth.
Well to be clear there is no easy path to 6 OOM in further energy efficiency improvement. At a strictly trends-prediction level that is of same order as the gap between a 286 and an nvidia RTX 4090, which took 40 years of civilization level effort. At a circuit theory level the implied ~1e15/s analog synaptic ops in 1e-5J is impossible without full reversible computing, as interconnect is only ~90% of the energy cost, not 99.999%, and the minimal analog or digital MAC op consumes far more than 0.1eV. So not only can it not even run conventional serial algorithms or massively parallel algorithms, it has to use fully reversible parallel logic. Like quantum computing, its still unclear what maps usefully to that paradigm I’m reasonably optimistic in the long term but ..
I’m skeptical that even the implied error bit correction rate energy costs would make much sense on the surface of the earth. An advanced quantum or reversible computer’s need for minimal noise and thus temperature to maintain coherence or low error rate is just a symptom of reaching highly perfected states of matter, where any tiny atomic disturbance can be catastrophic and cause a cascade of expensive-to-erase errors. Ironically such a computer would likely be much larger than the brain—this appears to be one of the current fundemental tradeoffs with most reversible computation, it’s not a simple free lunch (optical computers are absolutely enormous, superconducting circuits are large, reversibility increases area, etc) . At scale such systems would probably only work well off earth, perhaps far from the sun or buried in places like the darkside of the moon, because they become extremely sensitive to thermal noise, cosmic rays, and any disorder. We are talking about arcilect level tech in 2048 or something, not anything near term.
So instead I expect we’ll have a large population of neurmorphic AGI/uploads well before that.
To be clear, if I understand you correctly, the easier path to getting most of the 6 OOMs is through optical interconnect or superconducting interconnect, not via making the full jump to reversible computation (though that also doesn’t seem impossible. Moving all of it over seems hard, but you can maybe find some way to get a core computation like matrix multiplies into it, but I really haven’t thought much about this and this take might be really dumb).
I mean, the easiest solution is just “make it smaller and use active cooling”. The relevant loopholes in Jacob’s argument are in the Density and Temperature section of his Brain Efficiency post.
Jacob is using a temperature formula for blackbody radiators, which is basically just irrelevant to temperature of realistic compute substrate—brains, chips, and probably future compute substrates are all cooled by conduction through direct contact with something cooler (blood for the brain, heatsink/air for a chip). The obvious law to use instead would just be the standard thermal conduction law: heat flow per unit area proportional to temperature gradient.
Jacob’s analysis in that section also fails to adjust for how, by his own model in the previous section, power consumption scales linearly with system size (and also scales linearly with temperature).
Put all that together, and we get:
qA=C1TSRR2=C2(TS−TE)R
… where:
R is radius of the system
A is surface area of thermal contact
q is heat flow out of system
TS is system temperature
TE is environment temperature (e.g. blood or heat sink temperature)
C1,C2 are constants with respect to system size and temperature
(Of course a spherical approximation is not great, but we’re mostly interested in change as all the dimensions scale linearly, so the geometry shouldn’t matter for our purposes.)
First key observation: all the R’s cancel out. If we scale down by a factor of 2, the power consumption is halved (since every wire is half as long), the area is quartered (so power density over the surface is doubled), and the temperature gradient is doubled since the surface is half as thick. So, overall, equilibrium temperature stays the same as the system scales down.
So in fact scaling down is plausibly free, for purposes of heat management. (Though I’m not highly confident that would work in practice. In particular, I’m least confident about the temperature gradient scaling with system size, in practice. If that failed, then the temperature delta relative to the environment would scale at-worst ~linearly with inverse size, i.e. halving the size would double the temperature delta.)
On top of that, we could of course just use a colder environment, i.e. pump liquid nitrogen or even liquid helium over the thing. According to this meta-analysis, the average temperature delta between e.g. brain and blood is at most ~2.5 C, so even liquid nitrogen would be enough to achieve ~100x larger temperature delta if the system were at the same temperature as the brain; we don’t even need to go to liquid helium for that.
In terms of scaling, our above formula says that TS will scale proportionally to TE. Halve the environment temperature, halve the system temperature. And that result I do expect to be pretty robust (for systems near Jacob’s interconnect Landauer limit), since it just relies on temperature scaling of the Landauer limit plus heat flow being proportional to temperature delta.
The brain already uses active liquid cooling of course, so this is just make it smaller and cool it harder.
I have not had time to investigate your claimed physics on how cooling scales, but I”m skeptical—pumping a working coolant through the compute volume can only extract a limited constant amount of heat from the volume per unit of coolant flowing per time step (this should be obvious?), and thus the amount of heat that can be removed must scale strictly with the surface area (assuming that you’ve already maxed out the cooling effect per unit coolant).
So reduce radius by 2x and you reduce surface area and thus heat pumped out by 4x, but only reduce heat production via reducing wire length by at most 2x as I described in the article.
Active cooling ends up using more energy as you are probably aware. Moving to a colder environment is of course feasible (and used to some extent by some datacenters), but that hardly gets OOM gains on earth.
Well to be clear there is no easy path to 6 OOM in further energy efficiency improvement. At a strictly trends-prediction level that is of same order as the gap between a 286 and an nvidia RTX 4090, which took 40 years of civilization level effort. At a circuit theory level the implied ~1e15/s analog synaptic ops in 1e-5J is impossible without full reversible computing, as interconnect is only ~90% of the energy cost, not 99.999%, and the minimal analog or digital MAC op consumes far more than 0.1eV. So not only can it not even run conventional serial algorithms or massively parallel algorithms, it has to use fully reversible parallel logic. Like quantum computing, its still unclear what maps usefully to that paradigm I’m reasonably optimistic in the long term but ..
I’m skeptical that even the implied error bit correction rate energy costs would make much sense on the surface of the earth. An advanced quantum or reversible computer’s need for minimal noise and thus temperature to maintain coherence or low error rate is just a symptom of reaching highly perfected states of matter, where any tiny atomic disturbance can be catastrophic and cause a cascade of expensive-to-erase errors. Ironically such a computer would likely be much larger than the brain—this appears to be one of the current fundemental tradeoffs with most reversible computation, it’s not a simple free lunch (optical computers are absolutely enormous, superconducting circuits are large, reversibility increases area, etc) . At scale such systems would probably only work well off earth, perhaps far from the sun or buried in places like the darkside of the moon, because they become extremely sensitive to thermal noise, cosmic rays, and any disorder. We are talking about arcilect level tech in 2048 or something, not anything near term.
So instead I expect we’ll have a large population of neurmorphic AGI/uploads well before that.