I don’t see how it’s possible to make a useful comparison this way; human and LLM ability profiles, and just the nature of what they’re doing, are too different. An LLM can one-shot tasks that a human would need non-typing time to think about, so in that sense this underestimates the difference, but on a task that’s easy for a human but the LLM can only do with a long chain of thought, it overestimates the difference.
Put differently: the things that LLMs can do with one shot and no CoT imply that they can do a whole lot of cognitive work in a single forward pass, maybe a lot more than a human can ever do in the time it takes to type one word. But that cognitive work doesn’t compound like a human’s; it has to pass through the bottleneck of a single token, and be substantially repeated on each future token (at least without modifications like Coconut).
(Edit: The last sentence isn’t quite right — KV caching means the work doesn’t have to all be recomputed, though I would still say it doesn’t compound.)
I don’t think it’s an outright meaningless comparison, but I think it’s bad enough that it feels misleading or net-negative-for-discourse to describe it the way your comment did. Not sure how to unpack that feeling further.
Well, I upvoted your comment, which I think adds important nuance. I will also edit my shortform to explicitly say to check your comment. Hopefully, the combination of the two is not too misleading. Please add more thoughts as they occur to you about how better to frame this.
I don’t see how it’s possible to make a useful comparison this way; human and LLM ability profiles, and just the nature of what they’re doing, are too different. An LLM can one-shot tasks that a human would need non-typing time to think about, so in that sense this underestimates the difference, but on a task that’s easy for a human but the LLM can only do with a long chain of thought, it overestimates the difference.
Put differently: the things that LLMs can do with one shot and no CoT imply that they can do a whole lot of cognitive work in a single forward pass, maybe a lot more than a human can ever do in the time it takes to type one word. But that cognitive work doesn’t compound like a human’s; it has to pass through the bottleneck of a single token, and be substantially repeated on each future token (at least without modifications like Coconut).
(Edit: The last sentence isn’t quite right — KV caching means the work doesn’t have to all be recomputed, though I would still say it doesn’t compound.)
Yeah, of course. Just trying to get some kind of rough idea at what point future systems will be starting from.
I don’t think it’s an outright meaningless comparison, but I think it’s bad enough that it feels misleading or net-negative-for-discourse to describe it the way your comment did. Not sure how to unpack that feeling further.
Well, I upvoted your comment, which I think adds important nuance. I will also edit my shortform to explicitly say to check your comment. Hopefully, the combination of the two is not too misleading. Please add more thoughts as they occur to you about how better to frame this.