That arxiv paper isn’t about “LLMs”, right? Really, from my perspective, the ML models in that arxiv paper have roughly no relation to LLMs at all.
Is this a load-bearing part of your expectation of why transformer-based LLMs will hit a scaling wall?
No … I brought this up to make a narrow point about imitation learning (a point that I elaborate on much more in §2.3.2 of the next post), namely that imitation learning is present and very important for LLMs, and absent in human brains. (And that arxiv paper is unrelated to this point, because there is no imitation learning anywhere in that arxiv paper.)
Cool paper, thanks!
That arxiv paper isn’t about “LLMs”, right? Really, from my perspective, the ML models in that arxiv paper have roughly no relation to LLMs at all.
No … I brought this up to make a narrow point about imitation learning (a point that I elaborate on much more in §2.3.2 of the next post), namely that imitation learning is present and very important for LLMs, and absent in human brains. (And that arxiv paper is unrelated to this point, because there is no imitation learning anywhere in that arxiv paper.)