Update: Ryan Greenblatt is right I messed up the numbers, serial speedup as of 2025 LLMs is closer to 100x than 30,000x. Steinhardt says forward pass per layer is 1-10 microseconds, which still means forward pass for entire transformer is 1-10 milliseconds.
Prediction: Serial speedup of LLMs is going to matter way more than parallel speedup
Defintion: Serial speedup means running LLM forward passes faster. Parallel speedup means running more copies of LLM in parallel. Both are paths that allow the total system to produce more output than an individual LLM.
Disclaimer
For now, let’s measure progress in a domain where candidate solutions are verifiable fast and cheap.
Assume fast means less than 1 second of wall clock time. Cheap means less than $0.01 per experiment.
Examples of domains where each “experiment” is fast and cheap: pure math, software, human persuasion, (maybe) AI research, (maybe) nanotechnology
Examples of domains where each experiment is expensive: Particle colliders in experimental particle physics (can cost >$1M per run), cloning experiments in biotech ($100 per run)
Examples of domains where each experiment is slow: Spaceflight (each launch takes years of planning), Archaeology (each excavation takes years), etc
The latter domains will also speedup ofcourse, but it complicates the analysis to also consider speed and cost of each lab experiment.
Why does serial speedup matter more?
Verifiers are a bottleneck
Ultimately no matter how many ideas you search through in your mind, the output is always a decision for the next lab experiment you want to run. You can’t zero-shot perfect understanding of the universe. You can however, be way more time-/cost-/sample-efficient than humans at figuring out the next experiment to run that helps you learn the most about the world.
New ideas build on top of old ideas. Parallel is like generating lots of new ideas, and then waiting to submit them to a verifier (like a lab experiment). Series is like generating an idea, verifying it, generating another, verifying another.
Empirical evidence: Scientific progress throughout history seems to be accelerating instead of growing linearly, as we make more and more domains verifiable (by inventing instruments such as an electron microscope or cyclotron or DNA sequencer etc)
Multi-year focus is rare
Most humans half-ass tasks, get distracted, give up etc. Once people get good “enough” at a task (to get money, sex, satisfy curiosity, etc), they stop trying as hard to improve
(Maybe) Empirical evidence: If you spend even 10 years of your life consistently putting effort to improving at a task, you can probably reach among the top 1000 people on Earth in that task.
The primary reason I’m not a top-1000 guitarist or neuroscientist or politician is because I don’t care enough to put in the hours. My brain structure is likely not that different from the people who are good at the task, I probably have the basic hardware and the algorithms required to get good. Sure, I will maybe not reach the level of Magnus Carlsen with hard work alone, but I could improve a lot with hard work.
Humans only live <100 years, we don’t really know how much intellectual progress is possible if a human could think about a problem for 1000 years for example.
Empirical evidence: We know that civilisations as a whole can survive for 1000 years and make amounts of progress that are unimaginable at the start. No one in year 0 could have predicted year 1000, and no one in year 1000 could have predicted year 2000.
RL/inference scales exponentially
RL/inference scaling grows exponentially in cost, as we all know from log scaling curves. 10x more compute for RL/inference scaling means log(10) more output.
Paralleling humans is meh
Empirical evidence: We don’t have very good evidence that a country with 10x population produces 10x intellectual output. Factors like culture may be more important. We do have lots of obvious evidence that 10 years of research produces more output than 1 year, and 100 years produces more than 10 years.
It is possible this is similar to the RL/inference scaling curve, maybe 10x more researchers means log(10) more output.
A human speaks at 100-150 words per minute, or around 3 tokens per second. This 30,000x slower.
You could maybe make an argument that human thought stream is actually running faster than that, and that we think faster than we speak.
At 30,000x speedup, the AI experiences 100 simulated years per day of wall clock time, or 3,000,000 years in 100 years of wall clock time. If you gave me 3,000,000 years to live and progress some field, it is beyond my imagination what I would end up doing.
Even assuming only 100x speedup, the AI experiences 10,000 simulated years per 100 years of wall clock time. Even if you gave me 10,000 years to progress some field, it is beyond my imagination what I would do. (Remember that 100x speedup is way too slow, and 30,000x is closer to actual reality of machine learning.)
P.S. On thinking more about it, if you gave me 3 million simulated years per 100 years of wall clock time, I might consider this situation worse than death.
I will have to wait half a day per second of wall clock time, or multiple days to move my finger. So my body is as good as paralysed, from the point of view of my mind. Yes I can eventually move my body, but do I want to endure the years of simulated time required to get useful bodily movements? This is basically a mind prison.
Also everybody around me is still too slow, so I’m as good as the only person alive. No social contact will ever be possible.
I could setup a way to communicate with a computer using eye movements or something, if I can endure the years of living in mind prison required to do this.
The number one thing that would end eternally torment would be for me to able to communicate with another being (maybe even my own clone) that runs at a speed similar to mine. Social contact will help.
Update: Ryan Greenblatt is right I messed up the numbers, serial speedup as of 2025 LLMs is closer to 100x than 30,000x. Steinhardt says forward pass per layer is 1-10 microseconds, which still means forward pass for entire transformer is 1-10 milliseconds.
Prediction: Serial speedup of LLMs is going to matter way more than parallel speedup
Defintion: Serial speedup means running LLM forward passes faster. Parallel speedup means running more copies of LLM in parallel. Both are paths that allow the total system to produce more output than an individual LLM.
Disclaimer
For now, let’s measure progress in a domain where candidate solutions are verifiable fast and cheap.
Assume fast means less than 1 second of wall clock time. Cheap means less than $0.01 per experiment.
Examples of domains where each “experiment” is fast and cheap: pure math, software, human persuasion, (maybe) AI research, (maybe) nanotechnology
Examples of domains where each experiment is expensive: Particle colliders in experimental particle physics (can cost >$1M per run), cloning experiments in biotech ($100 per run)
Examples of domains where each experiment is slow: Spaceflight (each launch takes years of planning), Archaeology (each excavation takes years), etc
The latter domains will also speedup ofcourse, but it complicates the analysis to also consider speed and cost of each lab experiment.
Why does serial speedup matter more?
Verifiers are a bottleneck
Ultimately no matter how many ideas you search through in your mind, the output is always a decision for the next lab experiment you want to run. You can’t zero-shot perfect understanding of the universe. You can however, be way more time-/cost-/sample-efficient than humans at figuring out the next experiment to run that helps you learn the most about the world.
New ideas build on top of old ideas. Parallel is like generating lots of new ideas, and then waiting to submit them to a verifier (like a lab experiment). Series is like generating an idea, verifying it, generating another, verifying another.
Empirical evidence: Scientific progress throughout history seems to be accelerating instead of growing linearly, as we make more and more domains verifiable (by inventing instruments such as an electron microscope or cyclotron or DNA sequencer etc)
Multi-year focus is rare
Most humans half-ass tasks, get distracted, give up etc. Once people get good “enough” at a task (to get money, sex, satisfy curiosity, etc), they stop trying as hard to improve
(Maybe) Empirical evidence: If you spend even 10 years of your life consistently putting effort to improving at a task, you can probably reach among the top 1000 people on Earth in that task.
The primary reason I’m not a top-1000 guitarist or neuroscientist or politician is because I don’t care enough to put in the hours. My brain structure is likely not that different from the people who are good at the task, I probably have the basic hardware and the algorithms required to get good. Sure, I will maybe not reach the level of Magnus Carlsen with hard work alone, but I could improve a lot with hard work.
Humans only live <100 years, we don’t really know how much intellectual progress is possible if a human could think about a problem for 1000 years for example.
Empirical evidence: We know that civilisations as a whole can survive for 1000 years and make amounts of progress that are unimaginable at the start. No one in year 0 could have predicted year 1000, and no one in year 1000 could have predicted year 2000.
RL/inference scales exponentially
RL/inference scaling grows exponentially in cost, as we all know from log scaling curves. 10x more compute for RL/inference scaling means log(10) more output.
Paralleling humans is meh
Empirical evidence: We don’t have very good evidence that a country with 10x population produces 10x intellectual output. Factors like culture may be more important. We do have lots of obvious evidence that 10 years of research produces more output than 1 year, and 100 years produces more than 10 years.
It is possible this is similar to the RL/inference scaling curve, maybe 10x more researchers means log(10) more output.
How much serial speedup is possible?
Jacob Steinhardt says LLM forward pass can be brought down to below 10 microseconds per token or 100,000 tokens per second.
A human speaks at 100-150 words per minute, or around 3 tokens per second. This 30,000x slower.
You could maybe make an argument that human thought stream is actually running faster than that, and that we think faster than we speak.
At 30,000x speedup, the AI experiences 100 simulated years per day of wall clock time, or 3,000,000 years in 100 years of wall clock time. If you gave me 3,000,000 years to live and progress some field, it is beyond my imagination what I would end up doing.
Even assuming only 100x speedup, the AI experiences 10,000 simulated years per 100 years of wall clock time. Even if you gave me 10,000 years to progress some field, it is beyond my imagination what I would do. (Remember that 100x speedup is way too slow, and 30,000x is closer to actual reality of machine learning.)
P.S. On thinking more about it, if you gave me 3 million simulated years per 100 years of wall clock time, I might consider this situation worse than death.
I will have to wait half a day per second of wall clock time, or multiple days to move my finger. So my body is as good as paralysed, from the point of view of my mind. Yes I can eventually move my body, but do I want to endure the years of simulated time required to get useful bodily movements? This is basically a mind prison.
Also everybody around me is still too slow, so I’m as good as the only person alive. No social contact will ever be possible.
I could setup a way to communicate with a computer using eye movements or something, if I can endure the years of living in mind prison required to do this.
The number one thing that would end eternally torment would be for me to able to communicate with another being (maybe even my own clone) that runs at a speed similar to mine. Social contact will help.
Does this idea accelerate capabilities? (Someone might put more money into doing serial speedup after reading my post.)
Does it accelerate convincing people about AI risk? (Makes it more intuitive to visualise ASI, Yudkowsky uses similar metaphors to describe ASI)
I have honestly no idea.