I think that’s reasonable, but Anthropic’s claim would still be a little misleading if it can only do tasks that take humans 2 hours. Like, if you claim that your model can work for 30 hours, people don’t have much way to conceptualize what that means except in terms of human task lengths. After all, I can make GPT-4 remain coherent for 30 hours by running it very slowly. So what does this number actually mean?
I think that’s reasonable, but Anthropic’s claim would still be a little misleading if it can only do tasks that take humans 2 hours. Like, if you claim that your model can work for 30 hours, people don’t have much way to conceptualize what that means except in terms of human task lengths. After all, I can make GPT-4 remain coherent for 30 hours by running it very slowly. So what does this number actually mean?