I don’t think I get it. If I read this graph correctly, it seems to say that if you let a human play chess against an engine and want it to achieve equal performance, then the amount of time the human needs to think grows exponentially (as the engine gets stronger). This doesn’t make sense if extrapolated downward, but upward it’s about what I would expect. You can compensate for skill by applying more brute force, but it becomes exponentially costly, which fits the exponential graph.
It’s probably not perfect—I’d worry a lot about strategic mistakes in the opening—but it seems pretty good. So I don’t get how this is an argument against the metric.
It is a decent metric for chess but a) it doesn’t generalize to other tasks (as people seem to interpret the METR paper), and less importantly, b) I’m quite confident that people wouldn’t beat the chess engines by thinking for years.
I don’t think I get it. If I read this graph correctly, it seems to say that if you let a human play chess against an engine and want it to achieve equal performance, then the amount of time the human needs to think grows exponentially (as the engine gets stronger). This doesn’t make sense if extrapolated downward, but upward it’s about what I would expect. You can compensate for skill by applying more brute force, but it becomes exponentially costly, which fits the exponential graph.
It’s probably not perfect—I’d worry a lot about strategic mistakes in the opening—but it seems pretty good. So I don’t get how this is an argument against the metric.
It is a decent metric for chess but a) it doesn’t generalize to other tasks (as people seem to interpret the METR paper), and less importantly, b) I’m quite confident that people wouldn’t beat the chess engines by thinking for years.