One thing LLMs can teach us is that memorisation of facts is, in fact, a necessary part of reasoning and intelligent behaviour
There’s a very simple model under which memorization is important:
if good reasoning resembles test-time compute, judgement involves iterating through lots of potential hypotheses and validating them
the speed with which you validate each hypothesis is partly a function of the time it takes to fetch the data necessary to assess it
when that data is memorized (i.e. stored “in your brain’s weights”) access is really fast
when that data is stored externally (i.e. have to run a google search to find it) validation is OOMs slower
so you’ll be OOMs slower if you haven’t internalized the context of the problem
further, there are upper bounds on the slower process (because you get bored, or you fatigue, etc.) so this difference in speed may also end up being a difference in capability. it’s not just that it’s OOMs slower, it may not even get there
This doesn’t please me as I generally dislike memorization, but the logic seems plausible.
Inspired by this Tweet by Rohit Krishnan https://x.com/krishnanrohit/status/1923097822385086555
There’s a very simple model under which memorization is important:
if good reasoning resembles test-time compute, judgement involves iterating through lots of potential hypotheses and validating them
the speed with which you validate each hypothesis is partly a function of the time it takes to fetch the data necessary to assess it
when that data is memorized (i.e. stored “in your brain’s weights”) access is really fast
when that data is stored externally (i.e. have to run a google search to find it) validation is OOMs slower
so you’ll be OOMs slower if you haven’t internalized the context of the problem
further, there are upper bounds on the slower process (because you get bored, or you fatigue, etc.) so this difference in speed may also end up being a difference in capability. it’s not just that it’s OOMs slower, it may not even get there
This doesn’t please me as I generally dislike memorization, but the logic seems plausible.