Great post, I agree with everything you say in the first section. I disagree with your bottlenecks / amdahls law objection for reasons Ryan mentions; I think our analysis stands firm / takes those bottlenecks into account. (Though tbc we are very uncertain, more research is needed) As for hofstadters law, I think it is basically just the planning fallacy and yeah I think it’s a reasonable critique that insofar as our AI timelines are basically formed by doing something that looks like planning, we probably have a bias we need to correct for. I want to think more about the extent to which out timelines methodology is analogous to planning.
Thanks! I agree that my statements about Amdahl’s Law primarily hinge on my misunderstanding of the milestones, as elucidated in the back-and-forth with Ryan. I need to digest that; as Ryan anticipates, possibly I’ll wind up with thoughts worth sharing regarding the “human-only, software-only” time estimates, especially for the earlier stages, but it’ll take me some time to chew on that.
(As a minor point of feedback, I’d suggest adding a bit of material near the top of the timelines and/or takeoff forecasts, clarifying the range of activities meant to be included in “superhuman coder” and “superhuman AI researcher”, e.g. listing some activities that are and are not in scope. I was startled to see Ryan say “my sense is that an SAR has to be better than humans at basically everything except vision”; I would never have guessed that was the intended interpretation.)
(“Has to” is maybe a bit strong, I think I probably should have said “will probably end up needing to be better competitive with the best human experts at basically everything (other than vision) and better at more central AI R&D given the realistic capability profile”. I think I generally expect full automation to hit everywhere all around the same time putting aside vision and physical tasks.)
As a minor point of feedback, I’d suggest adding a bit of material near the top of the timelines and/or takeoff forecasts, clarifying the range of activities meant to be included in “superhuman coder” and “superhuman AI researcher”, e.g. listing some activities that are and are not in scope. I was startled to see Ryan say “my sense is that an SAR has to be better than humans at basically everything except vision”; I would never have guessed that was the intended interpretation.)
This is fair. To the extent we have chosen what activities to include, it’s supposed to encompass everything that any researcher/engineer currently does to improve AIs’ AI R&D capabilities within AGI companies, see the AI R&D progress multiplier definition: “How much faster would AI R&D capabilities...”. As to whether we should include activities that researchers or engineers don’t do, my instinct is mostly no because the main thing I can think of there is data collection, and that feels like it should be treated separately (in the AI R&D progress multiplier appendix, we clarify that using new models for synthetic data generation isn’t included in the AI R&D progress multiplier as we want to focus on improved research skills, though I’m unsure if that the right choice and am open to changing).
But I did not put a lot of effort into thinking about how exactly to define the range of applicable activities and what domains should be included; My intuition is that it matters less than you think because I expect automation to be less jagged than you (I might write more about that in a separate comment) and because of intuitions that research taste is the key skill and is relatively domain-general, though I agree expertise helps. I agree that there will be varying multipliers depending on the domain, but given that the takeoff forecast is focused mostly on a set of AI R&D-specific milestones, I think it makes sense to focus on that.
Great post, I agree with everything you say in the first section. I disagree with your bottlenecks / amdahls law objection for reasons Ryan mentions; I think our analysis stands firm / takes those bottlenecks into account. (Though tbc we are very uncertain, more research is needed) As for hofstadters law, I think it is basically just the planning fallacy and yeah I think it’s a reasonable critique that insofar as our AI timelines are basically formed by doing something that looks like planning, we probably have a bias we need to correct for. I want to think more about the extent to which out timelines methodology is analogous to planning.
Thanks! I agree that my statements about Amdahl’s Law primarily hinge on my misunderstanding of the milestones, as elucidated in the back-and-forth with Ryan. I need to digest that; as Ryan anticipates, possibly I’ll wind up with thoughts worth sharing regarding the “human-only, software-only” time estimates, especially for the earlier stages, but it’ll take me some time to chew on that.
(As a minor point of feedback, I’d suggest adding a bit of material near the top of the timelines and/or takeoff forecasts, clarifying the range of activities meant to be included in “superhuman coder” and “superhuman AI researcher”, e.g. listing some activities that are and are not in scope. I was startled to see Ryan say “my sense is that an SAR has to be better than humans at basically everything except vision”; I would never have guessed that was the intended interpretation.)
(“Has to” is maybe a bit strong, I think I probably should have said “will probably end up needing to be better competitive with the best human experts at basically everything (other than vision) and better at more central AI R&D given the realistic capability profile”. I think I generally expect full automation to hit everywhere all around the same time putting aside vision and physical tasks.)
This is fair. To the extent we have chosen what activities to include, it’s supposed to encompass everything that any researcher/engineer currently does to improve AIs’ AI R&D capabilities within AGI companies, see the AI R&D progress multiplier definition: “How much faster would AI R&D capabilities...”. As to whether we should include activities that researchers or engineers don’t do, my instinct is mostly no because the main thing I can think of there is data collection, and that feels like it should be treated separately (in the AI R&D progress multiplier appendix, we clarify that using new models for synthetic data generation isn’t included in the AI R&D progress multiplier as we want to focus on improved research skills, though I’m unsure if that the right choice and am open to changing).
But I did not put a lot of effort into thinking about how exactly to define the range of applicable activities and what domains should be included; My intuition is that it matters less than you think because I expect automation to be less jagged than you (I might write more about that in a separate comment) and because of intuitions that research taste is the key skill and is relatively domain-general, though I agree expertise helps. I agree that there will be varying multipliers depending on the domain, but given that the takeoff forecast is focused mostly on a set of AI R&D-specific milestones, I think it makes sense to focus on that.