I will read the article, but there’s some rather questionable assumptions here that I don’t see how you could reach these conclusions while also considering them.
We invent algorithms for transformative AGI
60%
We invent a way for AGIs to learn faster than humans
40%
AGI inference costs drop below $25/hr (per human equivalent)
16%
We invent and scale cheap, quality robots
60%
We massively scale production of chips and power
46%
We avoid derailment by human regulation
70%
We avoid derailment by AI-caused delay
90%
We avoid derailment from wars (e.g., China invades Taiwan)
70%
We avoid derailment from pandemics
90%
We avoid derailment from severe depressions
95%
We invent algorithms for transformative AGI:
- Have you considered RSI? RSI in this context would be an algorithm that says “given a benchmark that measures objectively if an AI is transformative, propose a cognitive architecture for an AGI using a model with sufficient capabilities to make a reasonable guess”. You then train the AGI candidate from the cognitive architecture (most architectures will reuse pretrained components from prior attempts) and benchmark it. You maintain a “league” of multiple top performing AGI candidates, and each one of them is in each cycle examining all the results, developing theories, and proposing the next candidate architecture.
Because RSI uses criticality, it primarily depends on compute availability. Assuming sufficient compute, it would discover a transformative AGI algorithm quickly. Some iterations might take 1 month (training time for a llama scale model from noise), others would take less than 1 day, and many iterations would be attempted in parallel.
We invent a way for AGIs to learn faster than humans : Why is this even in the table? This would be 1.0 because it’s a known fact, AGI learns faster than humans. Again, from the llama training run, the model went from knowing nothing to domain human level in 1 month. That’s faster. (requiring far more data than humans isn’t an issue)
AGI inference costs drop below $25/hr (per human equivalent): Well, A100s are 0.87 per hour. A transformative AGI might use 32 A100s. $27.84 an hour. Looks like we’re at 1.0 on this one also. Note I didn’t even bother with a method to use compute at inference time more efficiently. In short, use a low end, cheap model, and a second model that assesses, given the current input, how likely it is that the full scale model will produce a meaningfully different output.
So for example, a robot doing a chore is using a low end, cheap model trained using supervised learning from the main model. That model runs in real time in hardware inside the robot. Whenever the input frame is poorly compressible from an onboard autoencoder trained on the supervised learning training set, the robot pauses what it’s doing and queries the main model. (some systems can get the latency low enough to not need to pause).
This general approach would save main model compute, such that it could cost $240 an hour to run the main model and yet as long as it isn’t used more than 10% of the time, and the local model is very small and cheap, it will hit your $25 an hour target.
This is not cutting edge and there are papers published this week on this and related approaches.
We invent and scale cheap, quality robots: Due to recent layoffs in the sota robotics teams and how Boston Dynamics has been of low investor interest, yes this is a possible open
We massively scale production of chips and power: I will have to check the essay to see how you define “massively”.
We avoid derailment by human regulation: This is a parallel probability. It’s saying actually 1 - (probability that ALL regulatory agencies in power blocs capable of building AGI regulate it to extinction). So it’s 1 - (EU AND USA AND China AND <any other parties> block tAGI research). The race dynamics mean that defection is rapid so the blocks fail rapidly, even under suspected defection. If one party believes the others might be getting tAGI, they may start a national defense project to rush build their own, a project that is exempt from said regulation.
We avoid derailment by AI-caused delay: Another parallel probability. It’s the probability that an “AI chernobyl” happens and all parties respect the lesson of it.
We avoid derailment from wars (e.g., China invades Taiwan) : Another parallel probability, as Taiwan is not the only location on earth capable of manufacturing chips for AI.
We avoid derailment from pandemics: So a continuous series of pandemics, not a 3 year dip to ~80% productivity? Due to RSI even 80% productivity may be sufficient.
We avoid derailment from severe depressions: Another parallel probability.
With that said : in 15 years I find the conclusion of “probably not” tAGI that can do almost everything better than humans, including robotics, and it’s cheap per hour, and massive amounts of new infrastructure have been built, to probably be correct. So many steps, and even if you’re completely wrong due to the above or black swans, it doesn’t mean a different step won’t be rate limiting. Worlds where we have AGI that is still ‘transformative’ by a plain meaning of the term, and the world is hugely different, and AGI is receiving immense financial investments and is the most valuable industry on earth, are entirely possible without it actually satisfying the end conditions this essay is aimed at.
We invent a way for AGIs to learn faster than humans : Why is this even in the table? This would be 1.0 because it’s a known fact, AGI learns faster than humans. Again, from the llama training run, the model went from knowing nothing to domain human level in 1 month. That’s faster. (requiring far more data than humans isn’t an issue)
100% feels overconfident. Some algorithms learning some things faster than humans is not proof that AGI will learn all things faster than humans. Just look at self-driving. It’s taking AI far longer than human teenagers to learn.
AGI inference costs drop below $25/hr (per human equivalent): Well, A100s are 0.87 per hour. A transformative AGI might use 32 A100s. $27.84 an hour. Looks like we’re at 1.0 on this one also.
100% feels overconfident. We don’t know if transformative will need 32 A100s, or more. Our essay explains why we think it’s more. Even if you disagree with us, I struggle to see how you can be 100% sure.
Hi Ted,
I will read the article, but there’s some rather questionable assumptions here that I don’t see how you could reach these conclusions while also considering them.
We invent algorithms for transformative AGI:
- Have you considered RSI? RSI in this context would be an algorithm that says “given a benchmark that measures objectively if an AI is transformative, propose a cognitive architecture for an AGI using a model with sufficient capabilities to make a reasonable guess”. You then train the AGI candidate from the cognitive architecture (most architectures will reuse pretrained components from prior attempts) and benchmark it. You maintain a “league” of multiple top performing AGI candidates, and each one of them is in each cycle examining all the results, developing theories, and proposing the next candidate architecture.
Because RSI uses criticality, it primarily depends on compute availability. Assuming sufficient compute, it would discover a transformative AGI algorithm quickly. Some iterations might take 1 month (training time for a llama scale model from noise), others would take less than 1 day, and many iterations would be attempted in parallel.
We invent a way for AGIs to learn faster than humans : Why is this even in the table? This would be 1.0 because it’s a known fact, AGI learns faster than humans. Again, from the llama training run, the model went from knowing nothing to domain human level in 1 month. That’s faster. (requiring far more data than humans isn’t an issue)
AGI inference costs drop below $25/hr (per human equivalent): Well, A100s are 0.87 per hour. A transformative AGI might use 32 A100s. $27.84 an hour. Looks like we’re at 1.0 on this one also. Note I didn’t even bother with a method to use compute at inference time more efficiently. In short, use a low end, cheap model, and a second model that assesses, given the current input, how likely it is that the full scale model will produce a meaningfully different output.
So for example, a robot doing a chore is using a low end, cheap model trained using supervised learning from the main model. That model runs in real time in hardware inside the robot. Whenever the input frame is poorly compressible from an onboard autoencoder trained on the supervised learning training set, the robot pauses what it’s doing and queries the main model. (some systems can get the latency low enough to not need to pause).
This general approach would save main model compute, such that it could cost $240 an hour to run the main model and yet as long as it isn’t used more than 10% of the time, and the local model is very small and cheap, it will hit your $25 an hour target.
This is not cutting edge and there are papers published this week on this and related approaches.
We invent and scale cheap, quality robots: Due to recent layoffs in the sota robotics teams and how Boston Dynamics has been of low investor interest, yes this is a possible open
We massively scale production of chips and power: I will have to check the essay to see how you define “massively”.
We avoid derailment by human regulation: This is a parallel probability. It’s saying actually 1 - (probability that ALL regulatory agencies in power blocs capable of building AGI regulate it to extinction). So it’s 1 - (EU AND USA AND China AND <any other parties> block tAGI research). The race dynamics mean that defection is rapid so the blocks fail rapidly, even under suspected defection. If one party believes the others might be getting tAGI, they may start a national defense project to rush build their own, a project that is exempt from said regulation.
We avoid derailment by AI-caused delay: Another parallel probability. It’s the probability that an “AI chernobyl” happens and all parties respect the lesson of it.
We avoid derailment from wars (e.g., China invades Taiwan) : Another parallel probability, as Taiwan is not the only location on earth capable of manufacturing chips for AI.
We avoid derailment from pandemics: So a continuous series of pandemics, not a 3 year dip to ~80% productivity? Due to RSI even 80% productivity may be sufficient.
We avoid derailment from severe depressions: Another parallel probability.
With that said : in 15 years I find the conclusion of “probably not” tAGI that can do almost everything better than humans, including robotics, and it’s cheap per hour, and massive amounts of new infrastructure have been built, to probably be correct. So many steps, and even if you’re completely wrong due to the above or black swans, it doesn’t mean a different step won’t be rate limiting. Worlds where we have AGI that is still ‘transformative’ by a plain meaning of the term, and the world is hugely different, and AGI is receiving immense financial investments and is the most valuable industry on earth, are entirely possible without it actually satisfying the end conditions this essay is aimed at.
100% feels overconfident. Some algorithms learning some things faster than humans is not proof that AGI will learn all things faster than humans. Just look at self-driving. It’s taking AI far longer than human teenagers to learn.
100% feels overconfident. We don’t know if transformative will need 32 A100s, or more. Our essay explains why we think it’s more. Even if you disagree with us, I struggle to see how you can be 100% sure.
Teenagers generally don’t start learning to drive until they have had fifteen years to orient themselves in the world.
AI and teenagers are not starting from the same point so the comparison does not map very well.