I think my short, narrowly technical response to this would be “agreed”.
Additional thoughts, which I would love your perspective on:
1. I feel like the idea that human activities involved in creating better models are broader than just, like, stereotypical things an ML Ph.D would do, is under-explored. Elsewhere in this thread you say “my sense is that an SAR has to be better than humans at basically everything except vision.” There’s a lot to unpack there, and I don’t think I’ve seen it discussed anywhere, including in AI 2027. Do stereotypical things an ML Ph.D would do constitute 95% of the work? 50%? Less? Does the rest of the work mostly consist of other sorts of narrowly technical software work (coding, distributed systems design, etc.), or is there broad spillover into other areas of expertise, including non-STEM expertise? What does that look like? Etc.
(I try to make this point a lot, generally don’t get much acknowledgement, and as a result have started to feel a bit like a crazy person. I appreciate you giving some validation to the idea. Please let me know if you suspect I’ve over-interpreted that validation.)
1a. Why “except vision”? Does an SAR have to be superhuman at creative writing, so that it can push forward creative writing capabilities in future models? (Obviously, substitute any number of other expertise domains for “creative writing”.) If yes, then why doesn’t it also need to be superhuman at vision (so that it can push forward vision capabilities)? If no, then presumably creative writing is one of the exceptions implied by the “basically” qualifier, what else falls in there?
2. “Superhuman AI researcher” feels like a very bad term for a system that is meant to be superhuman at the full range of activities involved in producing better models. It strongly suggests a narrower set of capabilities, thus making it hard to hold onto the idea that a broad definition is intended. Less critically, it also seems worthwhile to better define what is meant to fall within the umbrella of “superhuman coder”.
3. As I read through AI 2027 and then wrote my post here, I was confused as to the breadth of skills meant to be implied by “superhuman coder” and (especially) “superhuman AI researcher”, and probably did not maintain a consistent definition in my head, which may have confused my thinking.
4. I didn’t spend much time evaluating the reasoning behind the estimated speedups at each milestone (5x, 25x, 250x, 2000x). I might have more to say after digging into that. If/when I find the time, that, plus the discussion we’ve just had here, might be enough grist for a followup post.
Please let me know if you suspect I’ve over-interpreted that validation.
Slightly? My view is more like:
For AIs to be superhuman AI researchers, they probably need to match humans at most underlying/fundamental cognitive tasks, including reasonably sample efficient learning. (Or at least learning which is competitive with humans given the AIs structural advantages.)
This means they can probably learn how to do arbitary things pretty quickly and easily.
I think non-ML/software-engineering expertise (that you can’t quickly learn on the job) is basically never important in building more generally capable AI systems aside from maybe various things related to acquiring data from humans. (But IMO this won’t ultimately be needed.)
Does an SAR have to be superhuman at creative writing, so that it can push forward creative writing capabilities in future models?
Do human ML researcherse have to be superhuman at creative writing to push forward creative writing capabilites? I don’t particularly think so. Data might need to come from somewhere, but in the vision case, there are plenty of approaches which don’t require AIs with superhuman vision.
In the creative writing case, it’s a bit messy because the domain is intrinsically subjective. I nonetheless think you could make an AI which is superhuman at creative writing without good understanding of creative writing using just the (vast vast) quantity of data we already have on the internet.
I’m now very strongly feeling the need to explore the question of what sorts of activities go into creating better models, what sorts of expertise are needed, and how that might change as things move forward. Which unfortunately I know ~nothing about, so I’ll have to find some folks who are willing to let me pick their brains...
I think this is a good question. I’d love to hear from people with experience building frontier models have to say about it.
Meanwhile, my first pass at decomposing “activities that go into creating better models” into some distinct components that might be relevant in this discussion:
ML engineering: build & maintain distributed training setup, along with the infra and dev ops that go along with a complex software system
Data acquisition and curation: collect, filter, clean datasets; hire humans to produce/QA; generate synthetic data
Safety research and evaluation: red-teaming, interpretability, safety-specific evals, AI-assisted oversight, etc.
External productization: product UX and design, UX-driven performance optimization, legal compliance and policy, marketing, and much more.
Physical compute infrastructure: GPU procurement, data center building and management, power procurement, likely various physical logistics.
(I wonder what’s missing from this?)
Eli suggested above that we should bracket the issue of data. And I think it’s also reasonable to set aside 4 and 5 if we’re trying to think about how quickly a lab could iterate internally.
If we do that, we’re left with 1, 2, and 6. I think 1 and 2 are covered even by a fairly narrow definition of “superhuman (AI researcher + coder)”. I’m uncertain what to make of 6, besides having a generalized “it’s probably messier and more complicated than I think” kind of feeling about it.
I think my short, narrowly technical response to this would be “agreed”.
Additional thoughts, which I would love your perspective on:
1. I feel like the idea that human activities involved in creating better models are broader than just, like, stereotypical things an ML Ph.D would do, is under-explored. Elsewhere in this thread you say “my sense is that an SAR has to be better than humans at basically everything except vision.” There’s a lot to unpack there, and I don’t think I’ve seen it discussed anywhere, including in AI 2027. Do stereotypical things an ML Ph.D would do constitute 95% of the work? 50%? Less? Does the rest of the work mostly consist of other sorts of narrowly technical software work (coding, distributed systems design, etc.), or is there broad spillover into other areas of expertise, including non-STEM expertise? What does that look like? Etc.
(I try to make this point a lot, generally don’t get much acknowledgement, and as a result have started to feel a bit like a crazy person. I appreciate you giving some validation to the idea. Please let me know if you suspect I’ve over-interpreted that validation.)
1a. Why “except vision”? Does an SAR have to be superhuman at creative writing, so that it can push forward creative writing capabilities in future models? (Obviously, substitute any number of other expertise domains for “creative writing”.) If yes, then why doesn’t it also need to be superhuman at vision (so that it can push forward vision capabilities)? If no, then presumably creative writing is one of the exceptions implied by the “basically” qualifier, what else falls in there?
2. “Superhuman AI researcher” feels like a very bad term for a system that is meant to be superhuman at the full range of activities involved in producing better models. It strongly suggests a narrower set of capabilities, thus making it hard to hold onto the idea that a broad definition is intended. Less critically, it also seems worthwhile to better define what is meant to fall within the umbrella of “superhuman coder”.
3. As I read through AI 2027 and then wrote my post here, I was confused as to the breadth of skills meant to be implied by “superhuman coder” and (especially) “superhuman AI researcher”, and probably did not maintain a consistent definition in my head, which may have confused my thinking.
4. I didn’t spend much time evaluating the reasoning behind the estimated speedups at each milestone (5x, 25x, 250x, 2000x). I might have more to say after digging into that. If/when I find the time, that, plus the discussion we’ve just had here, might be enough grist for a followup post.
Slightly? My view is more like:
For AIs to be superhuman AI researchers, they probably need to match humans at most underlying/fundamental cognitive tasks, including reasonably sample efficient learning. (Or at least learning which is competitive with humans given the AIs structural advantages.)
This means they can probably learn how to do arbitary things pretty quickly and easily.
I think non-ML/software-engineering expertise (that you can’t quickly learn on the job) is basically never important in building more generally capable AI systems aside from maybe various things related to acquiring data from humans. (But IMO this won’t ultimately be needed.)
Do human ML researcherse have to be superhuman at creative writing to push forward creative writing capabilites? I don’t particularly think so. Data might need to come from somewhere, but in the vision case, there are plenty of approaches which don’t require AIs with superhuman vision.
In the creative writing case, it’s a bit messy because the domain is intrinsically subjective. I nonetheless think you could make an AI which is superhuman at creative writing without good understanding of creative writing using just the (vast vast) quantity of data we already have on the internet.
Thanks.
I’m now very strongly feeling the need to explore the question of what sorts of activities go into creating better models, what sorts of expertise are needed, and how that might change as things move forward. Which unfortunately I know ~nothing about, so I’ll have to find some folks who are willing to let me pick their brains...
I think this is a good question. I’d love to hear from people with experience building frontier models have to say about it.
Meanwhile, my first pass at decomposing “activities that go into creating better models” into some distinct components that might be relevant in this discussion:
Core algorithmic R&D: choose research questions, design & execute experiments, interpret findings
ML engineering: build & maintain distributed training setup, along with the infra and dev ops that go along with a complex software system
Data acquisition and curation: collect, filter, clean datasets; hire humans to produce/QA; generate synthetic data
Safety research and evaluation: red-teaming, interpretability, safety-specific evals, AI-assisted oversight, etc.
External productization: product UX and design, UX-driven performance optimization, legal compliance and policy, marketing, and much more.
Physical compute infrastructure: GPU procurement, data center building and management, power procurement, likely various physical logistics.
(I wonder what’s missing from this?)
Eli suggested above that we should bracket the issue of data. And I think it’s also reasonable to set aside 4 and 5 if we’re trying to think about how quickly a lab could iterate internally.
If we do that, we’re left with 1, 2, and 6. I think 1 and 2 are covered even by a fairly narrow definition of “superhuman (AI researcher + coder)”. I’m uncertain what to make of 6, besides having a generalized “it’s probably messier and more complicated than I think” kind of feeling about it.