This estimate is based on various biological comparisons, and for the purposes of this question I want to focus on the most conservative estimate from the report, that coming from how much computation was done by evolution in the history of life.
There is being reasonably conservative, and then there is that. There is a singularly correct way to generate predictive theories (bayesianism/solomonoff/etc), and it always involves the product (or sum in logprob bits) of theory prior complexity and theory posterior postdiction power. The theory based on anchoring to the sum “computation done by evolution in the history of life” completely fails to postdict the success of deep learning, and thus is so wildly improbable as to not be worth any further consideration, and seriously undermines the entire report.
There is a much simpler model which tightly postdicts the evidence to date: AI progress is straightforwardly predictable by availability of compute equivalent to that required to train some relevant set of functionally similar minimal-sized brain modules.
brains operate far from the Landauer limit at the bit level (though maybe not if you condition on information transmission via electrons, see this post),
The point of that post was to disprove/prevent that common claim about brain energy efficiency, as it’s based on naive oversimplified layman models (which any good semiconductor engineer would already scoff at, but the wisdom from that field is not widely enough distributed). Perhaps I didn’t communicate the simplest form of the argument, but it’s something like this:
First you can’t compare complex high level analog synaptic ops or floating-point ops (which are used to simulate synaptic ops) to low level binary switch ops: the former complex elements are built out of multiple OOM quantity of the simpler elements.
Secondly any real computer/brain is ultimately built out of minimal atomic elements which themselves are also computers to which the Landauer limit applies. So a computer could only operate directly at the (error rate corrected) thermodynamic limit implied by counting only it’s switching elements (transistors) if it didn’t require wires/interconnect. But in actuality any real computer/brain built out of such minimal elements ends up being mostly wires to be useful. Worse yet, the thermal output of a maximally miniaturized machine (to minimize interconnect cost) would be far beyond practical cooling capacity, and even worse worse yet the required cooling pipes (for the otherwise optimal 3d layout, as used in brains) would then inflate the volume and thus interconnect. Or you go 2d (as in silicon chips) and you can cool using a heatsink in the vertical dimension but then you have much worse interconnect scaling.
Or put it another way: latest ‘4 nm’ foundry tech is now within about 10x or so of minimal feature scaling, but energy efficiency scaling is already slowing down, and we still haven’t yet matched the brain there. The probability that biological evolution (which quickly evolved optimal nanotech bots operating right at the landauer limit), and the vast intelligence of our semiconductor research industry (humanity’s greatest technological achievement by landslides) just happened to reach about the same physical limits, without those limits being fundamental, is very small.
Case in point, the new RTX 4090 built on latest TSMC 4N process has 7.6e10 transistors switching at 2.2e9 hz while using 450 watts, or roughly 1.6e20 transistor bit erasures in 450J, or 3.7e-17 J/erasure, which is only about 200x larger than the Landauer bound of 1eV (1.6e-19 J) for reliable digital switching—if we wrongly assumed non-dissipative wires. If we assume more realistically that wire/interconnect dissipation uses 90% of the energy, then we are already only around 20x from hard limits.
This is why the future planned process nodes continue to offer die shrinkage, but disappointing shrinking energy efficiency improvements. It’s also why GPUs increasingly have more flops than can even be usefully shuffled to registers, and why GPU energy use is growing exponentially with shrinkage.
If this is true, producing general intelligence can be a much harder problem than this calculation gives it credit for, because anthropic considerations mean we would only be asking the question of how difficult general intelligence is to produce in worlds where general intelligence was actually produced.
If one starts doing anthropic adjustments, you don’t just stop there. The vast majority of our measure will be in various simulations, which dramatically shifts everything around to favor histories with life leading to singularities.
The theory based on anchoring to the sum “computation done by evolution in the history of life” completely fails to postdict the success of deep learning, and thus is so wildly improbable as to not be worth any further consideration, and seriously undermines the entire report.
I don’t think the point is that you need that much compute but that’s an upper bound on how much compute you might need. So I don’t understand your argument; it’s not like the report takes this as its central estimate. I don’t think the scaling in performance we’ve seen in the past 10 years, in which training compute got scaled up by 6-7 OOM in total, is strong evidence against training requirements for AGI being around 10^40 FLOP. That question just looks mostly uncertain to me.
The point of that post was to disprove statements like that as they are based on naive oversimplified layman models (which any good semiconductor engineer would already scoff at, but the wisdom from that field is not widely enough distributed).
Again, I don’t think this is particularly relevant to the post. I agree with you that the Landauer limit bound is very loose, that’s the entire reason I cited your post to begin with. I’m not sure why you somehow felt that your message had not been properly communicated. I’ve edited this part of the question to clarify what I actually meant.
However, it’s much easier to justify this bound for a physical computation you don’t understand very well than to justify something that’s tighter, and all you get after correcting for that is probably ~ 5 OOM of difference in the final answer, which I already incorporate in my 10^45 FLOP figure and which is also immaterial to the question I’m trying to ask here.
If one starts doing anthropic adjustments, you don’t just stop there. The vast majority of our measure will be in various simulations, which dramatically shifts everything around to favor histories with life leading to singularities.
I strongly disagree with this way of applying anthropic adjustments. I think these should not in principle be different from Bayesian updates: you start with some prior (could be something like a simplicity prior) over all subjective universes you could have observed and update based on what you actually observe. In that case there’s a trivial sense in which the simulation hypothesis is true, because you could always have a simulator that simulates every possible program that halts or something like this, but that doesn’t help you actually reduce the entropy of your own observations or to predict anything about the future, so it is not functional.
I think for this to go through you need to do anthropics using SIA or something similar and I don’t think that’s justifiable, so I also think this whole argument is illegitimate.
So I don’t understand your argument; it’s not like the report takes this as its central estimate. I don’t think the scaling in performance we’ve seen in the past 10 years, in which training compute got scaled up by 6-7 OOM in total, is strong evidence against training requirements for AGI being around 10^40 FLOP.
My argument is that the report does not use the correct procedure, where the correct procedure is to develop one or a few simple models that best postdict the relevant observed history. Most of the (correct) report would then be comparing posdictions of the simple models to the relevant history (AI progress), to adjust hyperparms and do model selection.
However, it’s much easier to justify this bound for a physical computation you don’t understand very well than to justify something that’s tighter, and all you get after correcting for that is probably ~ 5 OOM of difference in the final answer, which is immaterial to the question I’m trying to ask here.
Fair.
If one starts doing anthropic adjustments, you don’t just stop there. The vast majority of our measure will be in various simulations, which dramatically shifts everything around to favor histories with life leading to singularities.
I strongly disagree with this way of applying anthropic adjustments. I think these should not in principle be different from Bayesian updates: you start with some prior (could be something like a simplicity prior) over all subjective universes you could have observed and update based on what you actually observe. In that case there’s a trivial sense in which the simulation hypothesis is true, because you could always have a simulator that simulates every possible program that halts or something like this, but that doesn’t help you actually reduce the entropy of your own observations or to predict anything about the future, so it is not functional.
The optimal inference procedure (solomonoff in binary logic form, equivalent to full bayesianism) is basically what you describe: form a predictive distribution from all computable theories ranked by total entropy (posterior fit + complexity prior). I agree that probably does lead to accepting the simulation hypothesis, because most of the high fit submodels based on extensive physics sims will likely locate observers in simulations rather than root realities.
The anthropic update is then updating from approximate predictive models which don’t feature future sims to those that do.
I don’t understand what you mean by “doesn’t help you actually reduce the entropy of your own observations”, as that’s irrelevant. The anthropic update to include sim hypothesis is not an update to the core ideal predictive models themselves (as that is physics), it’s an update to the approximations we naturally must use to predict the far future.
I think for this to go through you need to do anthropics using SIA or something similar
I don’t see the connection to SIA, and regardless of that philosophical confusion there is only one known universally correct in the limits inference method: so the question is always just what would a computationally unbound solomonoff inducer infer?
There is being reasonably conservative, and then there is that. There is a singularly correct way to generate predictive theories (bayesianism/solomonoff/etc), and it always involves the product (or sum in logprob bits) of theory prior complexity and theory posterior postdiction power. The theory based on anchoring to the sum “computation done by evolution in the history of life” completely fails to postdict the success of deep learning, and thus is so wildly improbable as to not be worth any further consideration, and seriously undermines the entire report.
There is a much simpler model which tightly postdicts the evidence to date: AI progress is straightforwardly predictable by availability of compute equivalent to that required to train some relevant set of functionally similar minimal-sized brain modules.
The point of that post was to disprove/prevent that common claim about brain energy efficiency, as it’s based on naive oversimplified layman models (which any good semiconductor engineer would already scoff at, but the wisdom from that field is not widely enough distributed). Perhaps I didn’t communicate the simplest form of the argument, but it’s something like this:
First you can’t compare complex high level analog synaptic ops or floating-point ops (which are used to simulate synaptic ops) to low level binary switch ops: the former complex elements are built out of multiple OOM quantity of the simpler elements.
Secondly any real computer/brain is ultimately built out of minimal atomic elements which themselves are also computers to which the Landauer limit applies. So a computer could only operate directly at the (error rate corrected) thermodynamic limit implied by counting only it’s switching elements (transistors) if it didn’t require wires/interconnect. But in actuality any real computer/brain built out of such minimal elements ends up being mostly wires to be useful. Worse yet, the thermal output of a maximally miniaturized machine (to minimize interconnect cost) would be far beyond practical cooling capacity, and even worse worse yet the required cooling pipes (for the otherwise optimal 3d layout, as used in brains) would then inflate the volume and thus interconnect. Or you go 2d (as in silicon chips) and you can cool using a heatsink in the vertical dimension but then you have much worse interconnect scaling.
Or put it another way: latest ‘4 nm’ foundry tech is now within about 10x or so of minimal feature scaling, but energy efficiency scaling is already slowing down, and we still haven’t yet matched the brain there. The probability that biological evolution (which quickly evolved optimal nanotech bots operating right at the landauer limit), and the vast intelligence of our semiconductor research industry (humanity’s greatest technological achievement by landslides) just happened to reach about the same physical limits, without those limits being fundamental, is very small.
Case in point, the new RTX 4090 built on latest TSMC 4N process has 7.6e10 transistors switching at 2.2e9 hz while using 450 watts, or roughly 1.6e20 transistor bit erasures in 450J, or 3.7e-17 J/erasure, which is only about 200x larger than the Landauer bound of 1eV (1.6e-19 J) for reliable digital switching—if we wrongly assumed non-dissipative wires. If we assume more realistically that wire/interconnect dissipation uses 90% of the energy, then we are already only around 20x from hard limits.
This is why the future planned process nodes continue to offer die shrinkage, but disappointing shrinking energy efficiency improvements. It’s also why GPUs increasingly have more flops than can even be usefully shuffled to registers, and why GPU energy use is growing exponentially with shrinkage.
If one starts doing anthropic adjustments, you don’t just stop there. The vast majority of our measure will be in various simulations, which dramatically shifts everything around to favor histories with life leading to singularities.
I don’t think the point is that you need that much compute but that’s an upper bound on how much compute you might need. So I don’t understand your argument; it’s not like the report takes this as its central estimate. I don’t think the scaling in performance we’ve seen in the past 10 years, in which training compute got scaled up by 6-7 OOM in total, is strong evidence against training requirements for AGI being around 10^40 FLOP. That question just looks mostly uncertain to me.
Again, I don’t think this is particularly relevant to the post. I agree with you that the Landauer limit bound is very loose, that’s the entire reason I cited your post to begin with. I’m not sure why you somehow felt that your message had not been properly communicated. I’ve edited this part of the question to clarify what I actually meant.
However, it’s much easier to justify this bound for a physical computation you don’t understand very well than to justify something that’s tighter, and all you get after correcting for that is probably ~ 5 OOM of difference in the final answer, which I already incorporate in my 10^45 FLOP figure and which is also immaterial to the question I’m trying to ask here.
I strongly disagree with this way of applying anthropic adjustments. I think these should not in principle be different from Bayesian updates: you start with some prior (could be something like a simplicity prior) over all subjective universes you could have observed and update based on what you actually observe. In that case there’s a trivial sense in which the simulation hypothesis is true, because you could always have a simulator that simulates every possible program that halts or something like this, but that doesn’t help you actually reduce the entropy of your own observations or to predict anything about the future, so it is not functional.
I think for this to go through you need to do anthropics using SIA or something similar and I don’t think that’s justifiable, so I also think this whole argument is illegitimate.
My argument is that the report does not use the correct procedure, where the correct procedure is to develop one or a few simple models that best postdict the relevant observed history. Most of the (correct) report would then be comparing posdictions of the simple models to the relevant history (AI progress), to adjust hyperparms and do model selection.
Fair.
The optimal inference procedure (solomonoff in binary logic form, equivalent to full bayesianism) is basically what you describe: form a predictive distribution from all computable theories ranked by total entropy (posterior fit + complexity prior). I agree that probably does lead to accepting the simulation hypothesis, because most of the high fit submodels based on extensive physics sims will likely locate observers in simulations rather than root realities.
The anthropic update is then updating from approximate predictive models which don’t feature future sims to those that do.
I don’t understand what you mean by “doesn’t help you actually reduce the entropy of your own observations”, as that’s irrelevant. The anthropic update to include sim hypothesis is not an update to the core ideal predictive models themselves (as that is physics), it’s an update to the approximations we naturally must use to predict the far future.
I don’t see the connection to SIA, and regardless of that philosophical confusion there is only one known universally correct in the limits inference method: so the question is always just what would a computationally unbound solomonoff inducer infer?