Most people aren’t paying super close attention to any given domain
Claims about “human level” etc. performance draw attention but rarely deeper investigation
So if I see a claim about an AI system “surpassing humans” at xyz, I don’t expect to derive useful evidence unless I look a lot more closely
Of course, when it’s couched in hypothetical well, that’s fine, though I do sometimes have an intuition that I’m being led down an implausible path when hypotheticals presume something as hand-wavey as “human level” systems in some domain. Capabilities are spiky!
In forecasting, talking about reality leads to talking past each other, because everyone will expect different things. So it’s useful to zero in on intended hypotheticals, even if they are not of interest to you, simply to use an appropriate framing to understand the other things that person is saying.
Hmm, I think we’re seeing right now the pitfalls of terms like “AGI” and “superhuman” in past forecasting. Like, Tyler Cowen keeps saying o3 is AGI, which seems obviously wrong to me, but there’s enough wiggle room in the term that all I can really do is shrug. More specific claims are easier to evaluate in the future. I don’t have a bone to pick with long reports that carefully define lots of terms and make concrete predictions, but on the margin I think there’s too much ephemeral, buzzy conversation about “this model is PhD level” or “this model is human level at task xyz” or what have you. I’m far less interested in when we’ll have a “human level novelist” system, and more in, say, in what year an AI generated book will first be a New York Times bestseller (and that social forces might prevent this is both feature and bug).
Yeah, I agree. But also I think:
Most people aren’t paying super close attention to any given domain
Claims about “human level” etc. performance draw attention but rarely deeper investigation
So if I see a claim about an AI system “surpassing humans” at xyz, I don’t expect to derive useful evidence unless I look a lot more closely
Of course, when it’s couched in hypothetical well, that’s fine, though I do sometimes have an intuition that I’m being led down an implausible path when hypotheticals presume something as hand-wavey as “human level” systems in some domain. Capabilities are spiky!
In forecasting, talking about reality leads to talking past each other, because everyone will expect different things. So it’s useful to zero in on intended hypotheticals, even if they are not of interest to you, simply to use an appropriate framing to understand the other things that person is saying.
Hmm, I think we’re seeing right now the pitfalls of terms like “AGI” and “superhuman” in past forecasting. Like, Tyler Cowen keeps saying o3 is AGI, which seems obviously wrong to me, but there’s enough wiggle room in the term that all I can really do is shrug. More specific claims are easier to evaluate in the future. I don’t have a bone to pick with long reports that carefully define lots of terms and make concrete predictions, but on the margin I think there’s too much ephemeral, buzzy conversation about “this model is PhD level” or “this model is human level at task xyz” or what have you. I’m far less interested in when we’ll have a “human level novelist” system, and more in, say, in what year an AI generated book will first be a New York Times bestseller (and that social forces might prevent this is both feature and bug).