I always thought the best argument for superexponentiality was that (even without AI progress feeding into AI) we’d expect AI to reach an infinite time horizon in a finite time once they were better than humans at everything. (And I thought the “meta skills” thing was reasonable as a story of how that could happen.) This is also mentioned in the original METR paper iirc.
But when AI futures project make that argument, they don’t seem to want to lean very much on it, due to something about the definition. I can’t tell whether the argument is still importantly driving their belief in superexponentiality and this is just a technicality, or whether they think this actually destroys the argument (which would be a major update towards non-superexponentiality to me).
“Another argument for eventually getting superexponentiality is that it seems like superhuman AGIs should have infinite time horizons. However, under the definition of time horizon adapted from the METR reportabove, it’s not clear if infinite time horizons will ever be reached. This is because AIs are graded on their absolute task success rate, not whether they have a higher success rate than humans. As long as there’s a decreasing trend in ability to accomplish tasks as the time horizon gets longer, the time horizon won’t be infinite. This is something that has been observed with human baseliners (see Figure 16here). Even if infinite horizons are never reached, the time horizons might get extremely large which would still lend some support to superexponentiality. Even so, it’s unclear how much evidence this is for superexponentiality in the regime we are forecasting in.”
Time horizon is currently defined as the human task length it takes to get a 50% success rate, fit with a logistic curve that goes from 0 to 100%. But we just did that because it was a good fit to the data. If METR finds evidence of label noise or tasks that are inherently hard to complete with more than human reliability, we can just fit a different logistic curve or switch methodologies (likely at the same time we upgrade our benchmark) so that time horizon more accurately reflects how much intellectual labor the AI can replace before needing human intervention.
We’re working on updating our timelines in a bunch of ways, and I’ve thought a bit more about this.
My current best guess, which isn’t confident, is that if we took a version of the METR HCAST suite which didn’t have any ambiguities or bugs, then the AGIs would have infinite time horizons. And that we should discuss this theoretical task suite rather than the literal HCAST in our timelines forecast, so therefore we should expect a finite time singularity. If we kept the ambiguities/bugs instead we’d have an asymptote and a non-infinite value, as would humans with inifinite time. In the paragraph you’re quoting, I think that the main thing driving non-infinite values is that longer tasks are more likely to have ambiguities/bugs that make them unsolvable with full reliability.
I’m happy to talk about a theoretical HCAST suite with no bugs and infinitely many tasks of arbitrarily long time-horizon tasks, for the sake of argument (even though it is a little tricky to reason about and measuring human performance would be impractical).
I think the notion of an “infinite time horizon” system is a poor abstraction, because it implicitly assumes 100% reliability. Almost any practical, complex system has a small probability of error, even if this probability is too small to measure in practice. Once you stop using this abstraction, the argument doesn’t seem to hold up: surely a system that has 99% reliability at million-year tasks has lower than 99% reliability at 10 million-year tasks? This seems true even if a 10 million-year task is nothing more than 10 consecutive million-year tasks, and that seems strictly easier than an average 10 million-year task.
Yeah this is the primary argument pushing me toward thinking there shouldn’t be a finite-time singularity, as I mentioned I’m not confident. It does feel pretty crazy that a limits-of-intelligence ASI would have a (very large horizon) time horizon at which it has 0.00001% reliability though, which I think is unavoidable if we accept the trend.
I think how things behave might depend to some extent on how you define an achieved time horizon; if there is a cost/speed requirement, then it becomes more plausible that longer horizon lengths would either have ~the same or lower reliability / success rate as smaller ones, once the AI surpasses humans in long-horizon agency. Similar to how if we created a version of HCAST but flipped based on AI times, then at a fixed speed budget human “reliability” might increase at higher time horizons, because our advantage is in long horizon agency and not speed.
In general things seem potentially sensitive to definitional choices and I don’t feel like I’ve got things fully figured out in terms of what the behavior in the limit should be.
I always thought the best argument for superexponentiality was that (even without AI progress feeding into AI) we’d expect AI to reach an infinite time horizon in a finite time once they were better than humans at everything. (And I thought the “meta skills” thing was reasonable as a story of how that could happen.) This is also mentioned in the original METR paper iirc.
But when AI futures project make that argument, they don’t seem to want to lean very much on it, due to something about the definition. I can’t tell whether the argument is still importantly driving their belief in superexponentiality and this is just a technicality, or whether they think this actually destroys the argument (which would be a major update towards non-superexponentiality to me).
“Another argument for eventually getting superexponentiality is that it seems like superhuman AGIs should have infinite time horizons. However, under the definition of time horizon adapted from the METR report above, it’s not clear if infinite time horizons will ever be reached. This is because AIs are graded on their absolute task success rate, not whether they have a higher success rate than humans. As long as there’s a decreasing trend in ability to accomplish tasks as the time horizon gets longer, the time horizon won’t be infinite. This is something that has been observed with human baseliners (see Figure 16 here). Even if infinite horizons are never reached, the time horizons might get extremely large which would still lend some support to superexponentiality. Even so, it’s unclear how much evidence this is for superexponentiality in the regime we are forecasting in.”
Time horizon is currently defined as the human task length it takes to get a 50% success rate, fit with a logistic curve that goes from 0 to 100%. But we just did that because it was a good fit to the data. If METR finds evidence of label noise or tasks that are inherently hard to complete with more than human reliability, we can just fit a different logistic curve or switch methodologies (likely at the same time we upgrade our benchmark) so that time horizon more accurately reflects how much intellectual labor the AI can replace before needing human intervention.
We’re working on updating our timelines in a bunch of ways, and I’ve thought a bit more about this.
My current best guess, which isn’t confident, is that if we took a version of the METR HCAST suite which didn’t have any ambiguities or bugs, then the AGIs would have infinite time horizons. And that we should discuss this theoretical task suite rather than the literal HCAST in our timelines forecast, so therefore we should expect a finite time singularity. If we kept the ambiguities/bugs instead we’d have an asymptote and a non-infinite value, as would humans with inifinite time. In the paragraph you’re quoting, I think that the main thing driving non-infinite values is that longer tasks are more likely to have ambiguities/bugs that make them unsolvable with full reliability.
I’m happy to talk about a theoretical HCAST suite with no bugs and infinitely many tasks of arbitrarily long time-horizon tasks, for the sake of argument (even though it is a little tricky to reason about and measuring human performance would be impractical).
I think the notion of an “infinite time horizon” system is a poor abstraction, because it implicitly assumes 100% reliability. Almost any practical, complex system has a small probability of error, even if this probability is too small to measure in practice. Once you stop using this abstraction, the argument doesn’t seem to hold up: surely a system that has 99% reliability at million-year tasks has lower than 99% reliability at 10 million-year tasks? This seems true even if a 10 million-year task is nothing more than 10 consecutive million-year tasks, and that seems strictly easier than an average 10 million-year task.
Yeah this is the primary argument pushing me toward thinking there shouldn’t be a finite-time singularity, as I mentioned I’m not confident. It does feel pretty crazy that a limits-of-intelligence ASI would have a (very large horizon) time horizon at which it has 0.00001% reliability though, which I think is unavoidable if we accept the trend.
I think how things behave might depend to some extent on how you define an achieved time horizon; if there is a cost/speed requirement, then it becomes more plausible that longer horizon lengths would either have ~the same or lower reliability / success rate as smaller ones, once the AI surpasses humans in long-horizon agency. Similar to how if we created a version of HCAST but flipped based on AI times, then at a fixed speed budget human “reliability” might increase at higher time horizons, because our advantage is in long horizon agency and not speed.
In general things seem potentially sensitive to definitional choices and I don’t feel like I’ve got things fully figured out in terms of what the behavior in the limit should be.