As someone who is famously bearish on LLMs, I find this paper completely unconvincing as a reason to be bearish on LLMs.
Every few months, we get both a “this paper shows fundamental LLM limitations!!!” and a “this paper shows LLMs producing innovations!!!”, and both always turn out to be slop. Tiresome.
If you wanted to make a sensible bear case, it would have to be the fact that LLMs weren’t able to do the tasks because their short-term memory being too full/context was too long, and amnesia/lacking a long term memory is a huge reason LLMs simply aren’t used to automate stuff (the other problem being continual learning).
It’s best shown in the Claude Plays Pokemon benchmark (at least without cheating, where I define cheating as creating game-specific elements to help it win without it being developed by the AI itself), where a lot of failures come down to Claude having to relearn strategies that it developed/looping dozens of times in ways where a human would have stored the experience in memory and developed a strategy that counteracted it far quicker.
Yeah, but this case isn’t even an interesting kind of memory failure. It’s just running up against a hard barrier. I’m not even convinced they wouldn’t be able to do this paper’s tasks if you just gave them a basic memory scaffold, like Claude playing Pokemon has.
This “failure” would be demonstrated just as well by asking them to multiply two 10^4-digit numbe–
Wait, that literally was the previous “LLMs can’t follow algorithms” fad! And like with that one, I expect that part of the reason LLMs fail at the new paper’s tasks is because their training environments never incentivized them to do tons of boring menial operations, and to instead look for clever workarounds.
I feel like doing “tons of boring menial operations” is what many humans (including the ones bearish on AI replacing workforce) expect these things to be able to do and at least part of the reason why industries invest in it.
I also feel like “look for clever workarounds” is the type of thing that many skeptics fear will lead to undesirable outcomes wrt AI.
As someone who is famously bearish on LLMs, I find this paper completely unconvincing as a reason to be bearish on LLMs.
Every few months, we get both a “this paper shows fundamental LLM limitations!!!” and a “this paper shows LLMs producing innovations!!!”, and both always turn out to be slop. Tiresome.
If you wanted to make a sensible bear case, it would have to be the fact that LLMs weren’t able to do the tasks because their short-term memory being too full/context was too long, and amnesia/lacking a long term memory is a huge reason LLMs simply aren’t used to automate stuff (the other problem being continual learning).
It’s best shown in the Claude Plays Pokemon benchmark (at least without cheating, where I define cheating as creating game-specific elements to help it win without it being developed by the AI itself), where a lot of failures come down to Claude having to relearn strategies that it developed/looping dozens of times in ways where a human would have stored the experience in memory and developed a strategy that counteracted it far quicker.
Yeah, but this case isn’t even an interesting kind of memory failure. It’s just running up against a hard barrier. I’m not even convinced they wouldn’t be able to do this paper’s tasks if you just gave them a basic memory scaffold, like Claude playing Pokemon has.
This “failure” would be demonstrated just as well by asking them to multiply two 10^4-digit numbe–
Wait, that literally was the previous “LLMs can’t follow algorithms” fad! And like with that one, I expect that part of the reason LLMs fail at the new paper’s tasks is because their training environments never incentivized them to do tons of boring menial operations, and to instead look for clever workarounds.
I feel like doing “tons of boring menial operations” is what many humans (including the ones bearish on AI replacing workforce) expect these things to be able to do and at least part of the reason why industries invest in it.
I also feel like “look for clever workarounds” is the type of thing that many skeptics fear will lead to undesirable outcomes wrt AI.