The tl;dr is what I wrote: learning cycles would be hours or days, and a foom would require hundreds or thousands of learning cycles at minimum.
Much depends on what you mean by “learning cycle”—do you mean a complete training iteration (essentially a lifetime) of an AGI? Grown from seed to adult?
I’m not sure where you got the ‘hundreds to thousands’ of learning cycles from either. If you want to estimate the full experimental iteration cycle count, it would probably be better to estimate from smaller domains. Like take vision—how many full experimental cycles did it take to get to current roughly human-level DL vision?
It’s hard to say exactly, but it is roughly on the order of ‘not many’ - we achieved human-level vision with DL very soon after the hardware capability arrived.
If we look in the brain, we see that vision is at least 10% of the total computational cost of the entire brain, and the brain uses the same learning mechanisms and circuit patterns to solve vision as it uses to solve essentially everything else.
Likewise, we see that once we (roughly kindof) solved vision in the very general way the brain does, we see that same general techniques essentially work for all other domains.
There is just no plausible way for an intelligence to magic itself to super intelligence in less than large human timescales.
Oh thats easy—as soon as you get one adult, human level AGI running compactly on a single GPU, you can then trivially run it 100x faster on a supercomputer, and or replicate it 1 million fold or more. That generation of AGI then quickly produces the next, and then singularity.
It’s slow going until we get up to that key threshold of brain compute parity, but once you pass that we probably go through a phase transition in history.
While that particular discussion is quite interesting, it’s irrelevant to my point above—which is simply that once you achieve parity, it’s trivially easy to get at least weak superhuman performance through speed.
The whole issue is whether a hard takeoff is possible and/or plausible, presumably with currently available computing technology. Certainly with Landauer-limit computing technology it would be trivial to simulate billions of human minds in the space and energy usage of a single biological brain. If such technology existed, yes a hard takeoff as measured from biological-human scale would be an inevitability.
But what about today’s technology? The largest supercomputers in existence can maaaaybe simulate a single human mind at highly reduced speed and with heavy approximation. A single GPU wouldn’t even come close in either storage or processing capacity. The human brain has about 100bn neurons and operates at 100Hz. The NVIDIA Tesla K80 has 8.73TFLOPS single-precision performance with 24GB of memory. That’s 1.92bits per neuron and 0.87 floating point operations per neuron-cycle. Sorry, no matter how you slice it, neurons are complex things that interact in complex ways. There is just no possible way to do a full simulation with ~2 bits per neuron and ~1 flop per neuron-cycle. More reasonable assumptions about simulation speed and resource requirements demand supercomputers on the order of approximately the largest we as a species have in order to do real-time whole-brain emulations. And if such a thing did exist, it’s not “trivially easy” to expand its own computation power—it’s already running on the fastest stuff in existence!
So with today’s technology, any AI takeoff is likely to be a prolonged affair. This is absolutely certain to be the case if whole-brain emulation is used. So should hard-takeoffs be a concern? Not in the next couple of decades at least.
The human brain has about 100bn neurons and operates at 100Hz. The NVIDIA Tesla K80 has 8.73TFLOPS single-precision performance with 24GB of memory. That’s 1.92bits per neuron and 0.87 floating point operations per neuron-cycle. Sorry, no matter how you slice it, neurons are complex things that interact in complex ways. There is just no possible way to do a full simulation with ~2 bits per neuron and ~1 flop per neuron-cycle
You are assuming enormously suboptimal/naive simulation. Sure if you use a stupid simulation algorithm, the brain seems powerful.
As a sanity check, apply your same simulation algorithm to simulating the GPU itself.
It has 8 billion transistors that cycle at 1 ghz, with a typical fanout of 2 to 4. So that’s more than 10^19 gate ops/second! Far more than the brain . ..
The brain has about 100 trillion synapses, and the average spike rate is around 0.25hz (yes, really). So that’s only about 25 trillion synaptic events/second. Furthermore, the vast majority of those synapses are tiny and activate on an incoming spike with low probability around 25% to 30% or so (stochastic connection dropout). The average synapse has an SNR equivalent of 4 bits or less. All of these numbers are well-supported from the neuroscience lit.
Thus the brain as a circuit computes with < 10 trillion low bit ops/second. That’s nothing, even if it’s off by 10x.
Also, synapse memory isn’t so much an issue for ANNs, as weights are easily compressed 1000x or more by various schemes, from simple weight sharing to more complex techniques such as tensorization.
As we now approach moore’s law, our low level circuit efficiency has already caught up to the brain, or is it close. The remaining gap is almost entirely algorithmic level efficiency.
If you are assuming that a neuron contributes less than 2 bits of state (or 1 bit per 500 synapses) and 1 computation per cycle, then you know more about neurobiology than anyone alive.
I didn’t say anything in my post above about the per neuron state—because it’s not important. Each neuron is a low precision analog accumulator, roughly up to 8-10 bits ish, and there are 20 billion neurons in the cortex. There are another 80 billion in the cerebellum, but they are unimportant.
The memory cost of storing the state for an equivalent ANN is far less than than 20 billion bytes or so, because of compression—most of that state is just zero most of the time.
In terms of computation per neuron per cycle, when a neuron fires it does #fanout computations. Counting from the total synapse numbers is easier than estimating neurons * avg fanout, but gives the same results.
When a neuron doesn’t fire .. .it doesn’t compute anything of significance. This is true in the brain and in all spiking ANNs, as it’s equivalent to sparse matrix operations—where the computational cost depends on the number of nonzeros, not the raw size.
Much depends on what you mean by “learning cycle”—do you mean a complete training iteration (essentially a lifetime) of an AGI? Grown from seed to adult?
I’m not sure where you got the ‘hundreds to thousands’ of learning cycles from either. If you want to estimate the full experimental iteration cycle count, it would probably be better to estimate from smaller domains. Like take vision—how many full experimental cycles did it take to get to current roughly human-level DL vision?
It’s hard to say exactly, but it is roughly on the order of ‘not many’ - we achieved human-level vision with DL very soon after the hardware capability arrived.
If we look in the brain, we see that vision is at least 10% of the total computational cost of the entire brain, and the brain uses the same learning mechanisms and circuit patterns to solve vision as it uses to solve essentially everything else.
Likewise, we see that once we (roughly kindof) solved vision in the very general way the brain does, we see that same general techniques essentially work for all other domains.
Oh thats easy—as soon as you get one adult, human level AGI running compactly on a single GPU, you can then trivially run it 100x faster on a supercomputer, and or replicate it 1 million fold or more. That generation of AGI then quickly produces the next, and then singularity.
It’s slow going until we get up to that key threshold of brain compute parity, but once you pass that we probably go through a phase transition in history.
Citation on plausibility severely needed, which is the point.
While that particular discussion is quite interesting, it’s irrelevant to my point above—which is simply that once you achieve parity, it’s trivially easy to get at least weak superhuman performance through speed.
The whole issue is whether a hard takeoff is possible and/or plausible, presumably with currently available computing technology. Certainly with Landauer-limit computing technology it would be trivial to simulate billions of human minds in the space and energy usage of a single biological brain. If such technology existed, yes a hard takeoff as measured from biological-human scale would be an inevitability.
But what about today’s technology? The largest supercomputers in existence can maaaaybe simulate a single human mind at highly reduced speed and with heavy approximation. A single GPU wouldn’t even come close in either storage or processing capacity. The human brain has about 100bn neurons and operates at 100Hz. The NVIDIA Tesla K80 has 8.73TFLOPS single-precision performance with 24GB of memory. That’s 1.92bits per neuron and 0.87 floating point operations per neuron-cycle. Sorry, no matter how you slice it, neurons are complex things that interact in complex ways. There is just no possible way to do a full simulation with ~2 bits per neuron and ~1 flop per neuron-cycle. More reasonable assumptions about simulation speed and resource requirements demand supercomputers on the order of approximately the largest we as a species have in order to do real-time whole-brain emulations. And if such a thing did exist, it’s not “trivially easy” to expand its own computation power—it’s already running on the fastest stuff in existence!
So with today’s technology, any AI takeoff is likely to be a prolonged affair. This is absolutely certain to be the case if whole-brain emulation is used. So should hard-takeoffs be a concern? Not in the next couple of decades at least.
You are assuming enormously suboptimal/naive simulation. Sure if you use a stupid simulation algorithm, the brain seems powerful.
As a sanity check, apply your same simulation algorithm to simulating the GPU itself.
It has 8 billion transistors that cycle at 1 ghz, with a typical fanout of 2 to 4. So that’s more than 10^19 gate ops/second! Far more than the brain . ..
The brain has about 100 trillion synapses, and the average spike rate is around 0.25hz (yes, really). So that’s only about 25 trillion synaptic events/second. Furthermore, the vast majority of those synapses are tiny and activate on an incoming spike with low probability around 25% to 30% or so (stochastic connection dropout). The average synapse has an SNR equivalent of 4 bits or less. All of these numbers are well-supported from the neuroscience lit.
Thus the brain as a circuit computes with < 10 trillion low bit ops/second. That’s nothing, even if it’s off by 10x.
Also, synapse memory isn’t so much an issue for ANNs, as weights are easily compressed 1000x or more by various schemes, from simple weight sharing to more complex techniques such as tensorization.
As we now approach moore’s law, our low level circuit efficiency has already caught up to the brain, or is it close. The remaining gap is almost entirely algorithmic level efficiency.
If you are assuming that a neuron contributes less than 2 bits of state (or 1 bit per 500 synapses) and 1 computation per cycle, then you know more about neurobiology than anyone alive.
I don’t understand your statement.
I didn’t say anything in my post above about the per neuron state—because it’s not important. Each neuron is a low precision analog accumulator, roughly up to 8-10 bits ish, and there are 20 billion neurons in the cortex. There are another 80 billion in the cerebellum, but they are unimportant.
The memory cost of storing the state for an equivalent ANN is far less than than 20 billion bytes or so, because of compression—most of that state is just zero most of the time.
In terms of computation per neuron per cycle, when a neuron fires it does #fanout computations. Counting from the total synapse numbers is easier than estimating neurons * avg fanout, but gives the same results.
When a neuron doesn’t fire .. .it doesn’t compute anything of significance. This is true in the brain and in all spiking ANNs, as it’s equivalent to sparse matrix operations—where the computational cost depends on the number of nonzeros, not the raw size.