Couldn’t the Amdahl’s Law argument work in the opposite direction (i.e. even shorter timelines)?
Suppose AI R&D conducted by humans would take 100 years to achieve ASI. By Amdahl’s Law, there is likely some critical aspect of research that humans are particularly bad at, that causes the research to take a long time. An SAR might be good at the things humans are bad at (in a way that can’t be fixed by humans + AI working together—human + calculator is much better at arithmetic than human alone, but human + AlphaGo isn’t better at Go). So SAR might be able to get ASI in considerably less than 100 human-equivalent-years.
It seems to me that, a priori, we should expect Amdahl’s Law to affect humans and SARs to the same degree, so it shouldn’t change our time estimate. Unless there is some specific reason to believe that human researchers are less vulnerable to Amdahl’s Law; I don’t know enough to say whether that’s true.
That’s not how the math works. Suppose there are 200 activities under the heading of “AI R&D” that each comprise at least 0.1% of the workload. Suppose we reach a point where AI is vastly superhuman at 150 of those activities (which would include any activities that humans are particularly bad at), moderately superhuman at 40 more, and not much better than human (or even worse than human) at the remaining 10. Those 10 activities where AI is not providing much uplift comprise at least 1% of the AI R&D workload, and so progress can be accelerated at most 100x.
This is oversimplified; there is some room for superhuman ability (making excellent choices of experiments to run) can compensate for lack of uplift in other areas (time to code and execute individual experiments). But the fundamental point remains: a complex process can be bottlenecked by its slowest step. Amdahl’s Law is not symmetric – a chain can’t be as strong as its strongest link.
Another way to put this disagreement is that you can interpret all of the AI 2027 capability milestones as refering to the capability of the weakest bottlenecking capability, so:
Superhuman coder has to dominate all research engineers at all pure research engineering tasks. This includes the most bottlenecking capability.
SAR has to dominate all human researchers, which must include whatever task would otherwise bottleneck.
SIAR (superintelligent AI research) has to be so good at AI research—the gap between SAR and SIAR is 2x the gap between an automated median AGI company researcher and a SAR—that it has this huge 2x gap advantage over the SAR despite the potentially bottlenecking capabilities.
So, I think perhaps what is going on is that you mostly disagree with the human-only, software-only times and are plausibly mostly on board otherwise.
I think my short, narrowly technical response to this would be “agreed”.
Additional thoughts, which I would love your perspective on:
1. I feel like the idea that human activities involved in creating better models are broader than just, like, stereotypical things an ML Ph.D would do, is under-explored. Elsewhere in this thread you say “my sense is that an SAR has to be better than humans at basically everything except vision.” There’s a lot to unpack there, and I don’t think I’ve seen it discussed anywhere, including in AI 2027. Do stereotypical things an ML Ph.D would do constitute 95% of the work? 50%? Less? Does the rest of the work mostly consist of other sorts of narrowly technical software work (coding, distributed systems design, etc.), or is there broad spillover into other areas of expertise, including non-STEM expertise? What does that look like? Etc.
(I try to make this point a lot, generally don’t get much acknowledgement, and as a result have started to feel a bit like a crazy person. I appreciate you giving some validation to the idea. Please let me know if you suspect I’ve over-interpreted that validation.)
1a. Why “except vision”? Does an SAR have to be superhuman at creative writing, so that it can push forward creative writing capabilities in future models? (Obviously, substitute any number of other expertise domains for “creative writing”.) If yes, then why doesn’t it also need to be superhuman at vision (so that it can push forward vision capabilities)? If no, then presumably creative writing is one of the exceptions implied by the “basically” qualifier, what else falls in there?
2. “Superhuman AI researcher” feels like a very bad term for a system that is meant to be superhuman at the full range of activities involved in producing better models. It strongly suggests a narrower set of capabilities, thus making it hard to hold onto the idea that a broad definition is intended. Less critically, it also seems worthwhile to better define what is meant to fall within the umbrella of “superhuman coder”.
3. As I read through AI 2027 and then wrote my post here, I was confused as to the breadth of skills meant to be implied by “superhuman coder” and (especially) “superhuman AI researcher”, and probably did not maintain a consistent definition in my head, which may have confused my thinking.
4. I didn’t spend much time evaluating the reasoning behind the estimated speedups at each milestone (5x, 25x, 250x, 2000x). I might have more to say after digging into that. If/when I find the time, that, plus the discussion we’ve just had here, might be enough grist for a followup post.
Please let me know if you suspect I’ve over-interpreted that validation.
Slightly? My view is more like:
For AIs to be superhuman AI researchers, they probably need to match humans at most underlying/fundamental cognitive tasks, including reasonably sample efficient learning. (Or at least learning which is competitive with humans given the AIs structural advantages.)
This means they can probably learn how to do arbitary things pretty quickly and easily.
I think non-ML/software-engineering expertise (that you can’t quickly learn on the job) is basically never important in building more generally capable AI systems aside from maybe various things related to acquiring data from humans. (But IMO this won’t ultimately be needed.)
Does an SAR have to be superhuman at creative writing, so that it can push forward creative writing capabilities in future models?
Do human ML researcherse have to be superhuman at creative writing to push forward creative writing capabilites? I don’t particularly think so. Data might need to come from somewhere, but in the vision case, there are plenty of approaches which don’t require AIs with superhuman vision.
In the creative writing case, it’s a bit messy because the domain is intrinsically subjective. I nonetheless think you could make an AI which is superhuman at creative writing without good understanding of creative writing using just the (vast vast) quantity of data we already have on the internet.
I’m now very strongly feeling the need to explore the question of what sorts of activities go into creating better models, what sorts of expertise are needed, and how that might change as things move forward. Which unfortunately I know ~nothing about, so I’ll have to find some folks who are willing to let me pick their brains...
I think this is a good question. I’d love to hear from people with experience building frontier models have to say about it.
Meanwhile, my first pass at decomposing “activities that go into creating better models” into some distinct components that might be relevant in this discussion:
ML engineering: build & maintain distributed training setup, along with the infra and dev ops that go along with a complex software system
Data acquisition and curation: collect, filter, clean datasets; hire humans to produce/QA; generate synthetic data
Safety research and evaluation: red-teaming, interpretability, safety-specific evals, AI-assisted oversight, etc.
External productization: product UX and design, UX-driven performance optimization, legal compliance and policy, marketing, and much more.
Physical compute infrastructure: GPU procurement, data center building and management, power procurement, likely various physical logistics.
(I wonder what’s missing from this?)
Eli suggested above that we should bracket the issue of data. And I think it’s also reasonable to set aside 4 and 5 if we’re trying to think about how quickly a lab could iterate internally.
If we do that, we’re left with 1, 2, and 6. I think 1 and 2 are covered even by a fairly narrow definition of “superhuman (AI researcher + coder)”. I’m uncertain what to make of 6, besides having a generalized “it’s probably messier and more complicated than I think” kind of feeling about it.
SAR has to dominate all human researchers, which must include whatever task would otherwise bottleneck.
This, and the same description for the other milestones, aren’t completely right; it’s possible that there are some activities on which the SAR is worse. But it can’t be many activities and it can’t be much worse at them, given that the SAR needs to overall be doing the job of the best human researcher 30x faster.
I think my description is consistent with “some activities on which the SAR is worse” as long as these aren’t bottlenecking and it is overall dominating human researchers (as in, adding human researchers is negligable value).
But whatever, you’re the author here.
Maybe “Superhuman coder has to dominate all research engineers at all pure research engineering tasks” is too strong though.
Hmm, I think your argument is roughly right, but missing a key detail. In particular, the key aspect of the SARs (and higher levels of capability) is that they can be strictly better than humans at everything while simultaneously being 30x faster and 30x more numerous. (Or, there is 900x more parallel labor, but we can choose to run this as 30x more parallel instances each running 30x faster.)
So, even if these SARs are only slightly better than humans at these 10 activities and these activities don’t benefit from parallelization at all, they can still do them 30x faster!
So, progress can actually be accelerated by up to 3000x even if the AIs are only as good as humans at these 10 activities and can’t productively dump in more labor.
In practice, I expect that you can often pour more labor into whatever bottlenecks you might have. (And compensate etc as you noted.)
By the time the AIs have a 1000x AI R&D multiplier, they are running at 100x human speed! So, I don’t think the argument for “you won’t get 1000x uplift” can come down to amdahl’s law argument for automation itself. It will have to depend on compute bottlenecks.
(My sense is that the progress multipliers in AI 2027 are too high but also that the human-only times between milestones are somewhat too long. On net, this makes me expect somewhat slower takeoff with a substantial chance on much slower takeoff.)
This is valid for activities which benefit from speed and scale. But when output quality is paramount, speed and scale may not always provide much help?
My mental model is that, for some time to come, there will be activities where AIs simply aren’t very competent at all, such that even many copies running at high speed won’t provide uplift. For instance, if AIs aren’t in general able to make good choices regarding which experiments to run next, then even an army of very fast poor-experiment-choosers might not be worth much, we might still need to rely on people to choose experiments. Or if AIs aren’t much good at evaluating strategic business plans, it might be hard to train AIs to be better at running a business (a component of the SAIR → ASI transition) without relying on human input for that task.
For Amdah’s Law purposes, I’ve been shorthanding “incompetent AIs that don’t become useful for a task even when taking speed + scale into account” as “AI doesn’t provide uplift for that task”.
EDIT: of course, in practice it’s generally at least somewhat possible to trade speed+scale for quality, e.g. using consensus algorithms, or generate-and-test if you have a good way of identifying the best output. So a further refinement is to say that very high acceleration requires us to assume that this does not reach importantly diminishing returns in a significant set of activities.
EDIT2:
(My sense is that the progress multipliers in AI 2027 are too high but also that the human-only times between milestones are somewhat too long. On net, this makes me expect somewhat slower takeoff with a substantial chance on much slower takeoff.)
Sure, but for output quality better than what humans could (ever) do to matter for the relative speed up, you have to argue about compute bottlenecks, not Amdahl’s law for just the automation itself! (As in, if some humans would have done something in 10 years and it doesn’t have any environmental bottleneck, then 10x faster emulated humans can do it in 1 year.)
My mental model is that, for some time to come, there will be activities where AIs simply aren’t very competent at all,
Notably, SAR is defined as “Superhuman AI researcher (SAR): An AI system that can do the job of the best human AI researcher but faster, and cheaply enough to run lots of copies.” So, it is strictly better than the best human researcher(s)! So, your statement might be true, but is irrelevant if we’re conditioning on SAR.
It sounds like your actual objection is in the human-only, software-only time from superhuman coder to SAR (you think this would take more than 1.5-10 years).
Or perhaps your objection is that you think there will be a smaller AI R&D multiplier for superhuman coders. (But this isn’t relevant once you hit full automation!)
Sure, but for output quality better than what humans could (ever) do to matter for the relative speed up, you have to argue about compute bottlenecks, not Amdahl’s law for just the automation itself!
I’m having trouble parsing this sentence… which may not be important – the rest of what you’ve said seems clear, so unless there’s a separate idea here that needs responding to then it’s fine.
It sounds like your actual objection is in the human-only, software-only time from superhuman coder to SAR (you think this would take more than 1.5-10 years).
Or perhaps your objection is that you think there will be a smaller AI R&D multiplier for superhuman coders. (But this isn’t relevant once you hit full automation!)
Agreed that these two statements do a fairly good job of characterizing my objection. I think the discussion is somewhat confused by the term “AI researcher”. Presumably, for an SAR to accelerate R&D by 25x, “AI researcher” needs to cover nearly all human activities that go into AI R&D? And even more so for SAIR/250x. While I’ve never worked at an AI lab, I presume that the full set of activities involved in producing better models is pretty broad, with tails extending into domains pretty far from the subject matter of an ML Ph.D and sometimes carried out by people whose job titles and career paths bear no resemblance to “AI researcher”. Is that a fair statement?
If “producing better models” (AI R&D) requires more than just narrow “AI research” skills, then either SAR and SAIR need to be defined to cover that broader skill set (in which case, yes, I’d argue that 1.5-10 years is unreasonably short for unaccelerated SC->SAR), or if we stick with narrower definitions for SAR and SAIR then, yes, I’d argue for smaller multipliers.
You said “This is valid for activities which benefit from speed and scale. But when output quality is paramount, speed and scale may not always provide much help?”. But, when considering activities that aren’t bottlenecked on the environment, then to achieve 10x acceleration you just need 10 more speed at the same level of capability. In order for quality to be a crux for a relative speed up, there needs to be some environmental constraint (like you can only run 1 experiment).
Is that a fair statement?
Yep, my sense is that an SAR has to[1] be better than humans at basically everything except vision.
(Given this, I currently expect that SAR comes at basically the same time as “superhuman blind remote worker”, at least when putting aside niche expertise which you can’t learn without a bunch of interaction with humans or the environment. I don’t currently have a strong view on the difficulty of matching human visual abilites, particulary at video processing, but I wouldn’t be super surprised if video processing is harder than basically everything else ultimately.)
If “producing better models” (AI R&D) requires more than just narrow “AI research” skills, then either SAR and SAIR need to be defined to cover that broader skill set (in which case, yes, I’d argue that 1.5-10 years is unreasonably short for unaccelerated SC->SAR),
It is defined to cover the broader set? It says “An AI system that can do the job of the best human AI researcher?” (Presumably this is implicitly “any of the best AI researchers which presumably need to learn misc skills as part of their jobs etc.) Notably, Superintelligent AI researcher (SIAR) happens after “superhuman remote worker” which requires being able to automate any work a remote worker could do.
I’m guessing your crux is that the time is too short?
“Has to” is maybe a bit strong, I think I probably should have said “will probably end up needing to be better competitive with the best human experts at basically everything (other than vision) and better at more central AI R&D given the realistic capability profile”. I think I generally expect full automation to hit everywhere all around the same time putting aside vision and physical tasks.
We now have several branches going, I’m going to consolidate most of my response in just one branch since they’re converting onto similar questions anyway. Here, I’ll just address this:
But, when considering activities that aren’t bottlenecked on the environment, then to achieve 10x acceleration you just need 10 more speed at the same level of capability.
I’m imagining that, at some intermediate stages of development, there will be skills for which AI does not even match human capability (for the relevant humans), and its outputs are of unusably low quality.
Let’s say out of those 200 activities, (for simplicity) 199 would take humans 1 year, and one takes 100 years. If a researcher AI is only half as good as humans at some of the 199 tasks, but 100x better at the human-bottleneck task, then AI can do in 2 years what humans can do in 100.
Yes, but you’re assuming that human-driven AI R&D is very highly bottlenecked on a single, highly serial task, which is simply not the case. (If you disagree: which specific narrow activity are you referring to that constitutes the non-parallelizable bottleneck?)
Amdahl’s Law isn’t just a bit of math, it’s a bit of math coupled with long experience of how complex systems tend to decompose in practice.
Couldn’t the Amdahl’s Law argument work in the opposite direction (i.e. even shorter timelines)?
Suppose AI R&D conducted by humans would take 100 years to achieve ASI. By Amdahl’s Law, there is likely some critical aspect of research that humans are particularly bad at, that causes the research to take a long time. An SAR might be good at the things humans are bad at (in a way that can’t be fixed by humans + AI working together—human + calculator is much better at arithmetic than human alone, but human + AlphaGo isn’t better at Go). So SAR might be able to get ASI in considerably less than 100 human-equivalent-years.
It seems to me that, a priori, we should expect Amdahl’s Law to affect humans and SARs to the same degree, so it shouldn’t change our time estimate. Unless there is some specific reason to believe that human researchers are less vulnerable to Amdahl’s Law; I don’t know enough to say whether that’s true.
That’s not how the math works. Suppose there are 200 activities under the heading of “AI R&D” that each comprise at least 0.1% of the workload. Suppose we reach a point where AI is vastly superhuman at 150 of those activities (which would include any activities that humans are particularly bad at), moderately superhuman at 40 more, and not much better than human (or even worse than human) at the remaining 10. Those 10 activities where AI is not providing much uplift comprise at least 1% of the AI R&D workload, and so progress can be accelerated at most 100x.
This is oversimplified; there is some room for superhuman ability (making excellent choices of experiments to run) can compensate for lack of uplift in other areas (time to code and execute individual experiments). But the fundamental point remains: a complex process can be bottlenecked by its slowest step. Amdahl’s Law is not symmetric – a chain can’t be as strong as its strongest link.
Another way to put this disagreement is that you can interpret all of the AI 2027 capability milestones as refering to the capability of the weakest bottlenecking capability, so:
Superhuman coder has to dominate all research engineers at all pure research engineering tasks. This includes the most bottlenecking capability.
SAR has to dominate all human researchers, which must include whatever task would otherwise bottleneck.
SIAR (superintelligent AI research) has to be so good at AI research—the gap between SAR and SIAR is 2x the gap between an automated median AGI company researcher and a SAR—that it has this huge 2x gap advantage over the SAR despite the potentially bottlenecking capabilities.
So, I think perhaps what is going on is that you mostly disagree with the human-only, software-only times and are plausibly mostly on board otherwise.
I think my short, narrowly technical response to this would be “agreed”.
Additional thoughts, which I would love your perspective on:
1. I feel like the idea that human activities involved in creating better models are broader than just, like, stereotypical things an ML Ph.D would do, is under-explored. Elsewhere in this thread you say “my sense is that an SAR has to be better than humans at basically everything except vision.” There’s a lot to unpack there, and I don’t think I’ve seen it discussed anywhere, including in AI 2027. Do stereotypical things an ML Ph.D would do constitute 95% of the work? 50%? Less? Does the rest of the work mostly consist of other sorts of narrowly technical software work (coding, distributed systems design, etc.), or is there broad spillover into other areas of expertise, including non-STEM expertise? What does that look like? Etc.
(I try to make this point a lot, generally don’t get much acknowledgement, and as a result have started to feel a bit like a crazy person. I appreciate you giving some validation to the idea. Please let me know if you suspect I’ve over-interpreted that validation.)
1a. Why “except vision”? Does an SAR have to be superhuman at creative writing, so that it can push forward creative writing capabilities in future models? (Obviously, substitute any number of other expertise domains for “creative writing”.) If yes, then why doesn’t it also need to be superhuman at vision (so that it can push forward vision capabilities)? If no, then presumably creative writing is one of the exceptions implied by the “basically” qualifier, what else falls in there?
2. “Superhuman AI researcher” feels like a very bad term for a system that is meant to be superhuman at the full range of activities involved in producing better models. It strongly suggests a narrower set of capabilities, thus making it hard to hold onto the idea that a broad definition is intended. Less critically, it also seems worthwhile to better define what is meant to fall within the umbrella of “superhuman coder”.
3. As I read through AI 2027 and then wrote my post here, I was confused as to the breadth of skills meant to be implied by “superhuman coder” and (especially) “superhuman AI researcher”, and probably did not maintain a consistent definition in my head, which may have confused my thinking.
4. I didn’t spend much time evaluating the reasoning behind the estimated speedups at each milestone (5x, 25x, 250x, 2000x). I might have more to say after digging into that. If/when I find the time, that, plus the discussion we’ve just had here, might be enough grist for a followup post.
Slightly? My view is more like:
For AIs to be superhuman AI researchers, they probably need to match humans at most underlying/fundamental cognitive tasks, including reasonably sample efficient learning. (Or at least learning which is competitive with humans given the AIs structural advantages.)
This means they can probably learn how to do arbitary things pretty quickly and easily.
I think non-ML/software-engineering expertise (that you can’t quickly learn on the job) is basically never important in building more generally capable AI systems aside from maybe various things related to acquiring data from humans. (But IMO this won’t ultimately be needed.)
Do human ML researcherse have to be superhuman at creative writing to push forward creative writing capabilites? I don’t particularly think so. Data might need to come from somewhere, but in the vision case, there are plenty of approaches which don’t require AIs with superhuman vision.
In the creative writing case, it’s a bit messy because the domain is intrinsically subjective. I nonetheless think you could make an AI which is superhuman at creative writing without good understanding of creative writing using just the (vast vast) quantity of data we already have on the internet.
Thanks.
I’m now very strongly feeling the need to explore the question of what sorts of activities go into creating better models, what sorts of expertise are needed, and how that might change as things move forward. Which unfortunately I know ~nothing about, so I’ll have to find some folks who are willing to let me pick their brains...
I think this is a good question. I’d love to hear from people with experience building frontier models have to say about it.
Meanwhile, my first pass at decomposing “activities that go into creating better models” into some distinct components that might be relevant in this discussion:
Core algorithmic R&D: choose research questions, design & execute experiments, interpret findings
ML engineering: build & maintain distributed training setup, along with the infra and dev ops that go along with a complex software system
Data acquisition and curation: collect, filter, clean datasets; hire humans to produce/QA; generate synthetic data
Safety research and evaluation: red-teaming, interpretability, safety-specific evals, AI-assisted oversight, etc.
External productization: product UX and design, UX-driven performance optimization, legal compliance and policy, marketing, and much more.
Physical compute infrastructure: GPU procurement, data center building and management, power procurement, likely various physical logistics.
(I wonder what’s missing from this?)
Eli suggested above that we should bracket the issue of data. And I think it’s also reasonable to set aside 4 and 5 if we’re trying to think about how quickly a lab could iterate internally.
If we do that, we’re left with 1, 2, and 6. I think 1 and 2 are covered even by a fairly narrow definition of “superhuman (AI researcher + coder)”. I’m uncertain what to make of 6, besides having a generalized “it’s probably messier and more complicated than I think” kind of feeling about it.
This, and the same description for the other milestones, aren’t completely right; it’s possible that there are some activities on which the SAR is worse. But it can’t be many activities and it can’t be much worse at them, given that the SAR needs to overall be doing the job of the best human researcher 30x faster.
I think my description is consistent with “some activities on which the SAR is worse” as long as these aren’t bottlenecking and it is overall dominating human researchers (as in, adding human researchers is negligable value).
But whatever, you’re the author here.
Maybe “Superhuman coder has to dominate all research engineers at all pure research engineering tasks” is too strong though.
Ok yeah, seems like this is just a wording issue and we’re on the same page.
Hmm, I think your argument is roughly right, but missing a key detail. In particular, the key aspect of the SARs (and higher levels of capability) is that they can be strictly better than humans at everything while simultaneously being 30x faster and 30x more numerous. (Or, there is 900x more parallel labor, but we can choose to run this as 30x more parallel instances each running 30x faster.)
So, even if these SARs are only slightly better than humans at these 10 activities and these activities don’t benefit from parallelization at all, they can still do them 30x faster!
So, progress can actually be accelerated by up to 3000x even if the AIs are only as good as humans at these 10 activities and can’t productively dump in more labor.
In practice, I expect that you can often pour more labor into whatever bottlenecks you might have. (And compensate etc as you noted.)
By the time the AIs have a 1000x AI R&D multiplier, they are running at 100x human speed! So, I don’t think the argument for “you won’t get 1000x uplift” can come down to amdahl’s law argument for automation itself. It will have to depend on compute bottlenecks.
(My sense is that the progress multipliers in AI 2027 are too high but also that the human-only times between milestones are somewhat too long. On net, this makes me expect somewhat slower takeoff with a substantial chance on much slower takeoff.)
This is valid for activities which benefit from speed and scale. But when output quality is paramount, speed and scale may not always provide much help?
My mental model is that, for some time to come, there will be activities where AIs simply aren’t very competent at all, such that even many copies running at high speed won’t provide uplift. For instance, if AIs aren’t in general able to make good choices regarding which experiments to run next, then even an army of very fast poor-experiment-choosers might not be worth much, we might still need to rely on people to choose experiments. Or if AIs aren’t much good at evaluating strategic business plans, it might be hard to train AIs to be better at running a business (a component of the SAIR → ASI transition) without relying on human input for that task.
For Amdah’s Law purposes, I’ve been shorthanding “incompetent AIs that don’t become useful for a task even when taking speed + scale into account” as “AI doesn’t provide uplift for that task”.
EDIT: of course, in practice it’s generally at least somewhat possible to trade speed+scale for quality, e.g. using consensus algorithms, or generate-and-test if you have a good way of identifying the best output. So a further refinement is to say that very high acceleration requires us to assume that this does not reach importantly diminishing returns in a significant set of activities.
EDIT2:
I find this quite plausible.
Sure, but for output quality better than what humans could (ever) do to matter for the relative speed up, you have to argue about compute bottlenecks, not Amdahl’s law for just the automation itself! (As in, if some humans would have done something in 10 years and it doesn’t have any environmental bottleneck, then 10x faster emulated humans can do it in 1 year.)
Notably, SAR is defined as “Superhuman AI researcher (SAR): An AI system that can do the job of the best human AI researcher but faster, and cheaply enough to run lots of copies.” So, it is strictly better than the best human researcher(s)! So, your statement might be true, but is irrelevant if we’re conditioning on SAR.
It sounds like your actual objection is in the human-only, software-only time from superhuman coder to SAR (you think this would take more than 1.5-10 years).
Or perhaps your objection is that you think there will be a smaller AI R&D multiplier for superhuman coders. (But this isn’t relevant once you hit full automation!)
I’m having trouble parsing this sentence… which may not be important – the rest of what you’ve said seems clear, so unless there’s a separate idea here that needs responding to then it’s fine.
Agreed that these two statements do a fairly good job of characterizing my objection. I think the discussion is somewhat confused by the term “AI researcher”. Presumably, for an SAR to accelerate R&D by 25x, “AI researcher” needs to cover nearly all human activities that go into AI R&D? And even more so for SAIR/250x. While I’ve never worked at an AI lab, I presume that the full set of activities involved in producing better models is pretty broad, with tails extending into domains pretty far from the subject matter of an ML Ph.D and sometimes carried out by people whose job titles and career paths bear no resemblance to “AI researcher”. Is that a fair statement?
If “producing better models” (AI R&D) requires more than just narrow “AI research” skills, then either SAR and SAIR need to be defined to cover that broader skill set (in which case, yes, I’d argue that 1.5-10 years is unreasonably short for unaccelerated SC->SAR), or if we stick with narrower definitions for SAR and SAIR then, yes, I’d argue for smaller multipliers.
You said “This is valid for activities which benefit from speed and scale. But when output quality is paramount, speed and scale may not always provide much help?”. But, when considering activities that aren’t bottlenecked on the environment, then to achieve 10x acceleration you just need 10 more speed at the same level of capability. In order for quality to be a crux for a relative speed up, there needs to be some environmental constraint (like you can only run 1 experiment).
Yep, my sense is that an SAR has to[1] be better than humans at basically everything except vision.
(Given this, I currently expect that SAR comes at basically the same time as “superhuman blind remote worker”, at least when putting aside niche expertise which you can’t learn without a bunch of interaction with humans or the environment. I don’t currently have a strong view on the difficulty of matching human visual abilites, particulary at video processing, but I wouldn’t be super surprised if video processing is harder than basically everything else ultimately.)
It is defined to cover the broader set? It says “An AI system that can do the job of the best human AI researcher?” (Presumably this is implicitly “any of the best AI researchers which presumably need to learn misc skills as part of their jobs etc.) Notably, Superintelligent AI researcher (SIAR) happens after “superhuman remote worker” which requires being able to automate any work a remote worker could do.
I’m guessing your crux is that the time is too short?
“Has to” is maybe a bit strong, I think I probably should have said “will probably end up needing to be better competitive with the best human experts at basically everything (other than vision) and better at more central AI R&D given the realistic capability profile”. I think I generally expect full automation to hit everywhere all around the same time putting aside vision and physical tasks.
We now have several branches going, I’m going to consolidate most of my response in just one branch since they’re converting onto similar questions anyway. Here, I’ll just address this:
I’m imagining that, at some intermediate stages of development, there will be skills for which AI does not even match human capability (for the relevant humans), and its outputs are of unusably low quality.
Let’s say out of those 200 activities, (for simplicity) 199 would take humans 1 year, and one takes 100 years. If a researcher AI is only half as good as humans at some of the 199 tasks, but 100x better at the human-bottleneck task, then AI can do in 2 years what humans can do in 100.
Yes, but you’re assuming that human-driven AI R&D is very highly bottlenecked on a single, highly serial task, which is simply not the case. (If you disagree: which specific narrow activity are you referring to that constitutes the non-parallelizable bottleneck?)
Amdahl’s Law isn’t just a bit of math, it’s a bit of math coupled with long experience of how complex systems tend to decompose in practice.