Thinking times are now long enough that in principle frontier labs could route some API (or chat) queries to a human on the backend, right? Is this plausible? Could this give them a hype advantage if in the medium term, if they picked the most challenging (for LLMs) types of queries effectively, and if so, is there any technical barrier? I can see this kind of thing eventually coming out, if the Wentworth “it’s bullshit though” frame turns out to be partially right.
(I’m not suggesting they would do this kind of blatant cheating on benchmarks, and I have no inside knowledge suggesting this has ever happened)
Thinking times are now long enough that in principle frontier labs could route some API (or chat) queries to a human on the backend, right? Is this plausible? Could this give them a hype advantage if in the medium term, if they picked the most challenging (for LLMs) types of queries effectively, and if so, is there any technical barrier? I can see this kind of thing eventually coming out, if the Wentworth “it’s bullshit though” frame turns out to be partially right.
(I’m not suggesting they would do this kind of blatant cheating on benchmarks, and I have no inside knowledge suggesting this has ever happened)