Despite impressive capabilities, large language models have yet to produce a genuine breakthrough. The puzzle is why.
A reason may be that they lack some fundamental aspects of human thought: they are frozen, unable to learn from experience, and they have no “default mode” for background processing, a source of spontaneous human insight.
To illustrate the issue, I describe such insights, and give an example concrete algorithm of a day-dreaming loop (DDL): a background process that continuously samples pairs of concepts from memory. A generator model explores non-obvious links between them, and a critic model filters the results for genuinely valuable ideas. These discoveries are fed back into the system’s memory, creating a compounding feedback loop where new ideas themselves become seeds for future combinations.
The cost of this process—a “daydreaming tax”—would be substantial, given the low hit rate for truly novel connections. This expense, however, may be the necessary price for innovation. It would also create a moat against model distillation, as valuable insights emerge from the combinations no one would know to ask for.
The strategic implication is counterintuitive: to make AI cheaper and faster for end users, we might first need to build systems that spend most of their compute on this “wasteful” background search. This suggests a future where expensive, daydreaming AIs are used primarily to generate proprietary training data for the next generation of efficient models, offering a path around the looming data wall.
I’d also highlight the obstacles and implications sections:
…Just expensive. We could ballpark it as <20:1 based on the human example, as an upper bound, which would have severe implications for LLM-based research—a good LLM solution might be 2 OOMs more expensive than the LLM itself per task. Obvious optimizations like load shifting to the cheapest electricity region or running batch jobs can reduce the cost, but not by that much.
Cheap, good, fast: pick 2. So LLMs may gain a lot of their economic efficiency over humans by making a severe tradeoff, in avoiding generating novelty or being long-duration agents. And if this is the case, few users will want to pay 20× more for their LLM uses, just because once in a while there may be a novel insight.
This will be especially true if there is no way to narrow down the retrieved facts to ‘just’ the user-relevant ones to save compute; it may be that the most far-flung and low-prior connections are the important ones, and so there is no easy way to improve, no matter how annoyed the user is at receiving random puns or interesting facts about the CIA faking vampire attacks.
Only power-users, researchers, or autonomous agents will want to pay the ‘daydreaming tax’ (either in the form of higher upfront capital cost of training, or in paying for online daydreaming to specialize to the current problem for the asymptotic scaling improvements, see AI researcher Andy Jones 2021).
Data moat. So this might become a major form of RL scaling, with billions of dollars of compute going into ‘daydreaming AIs’, to avoid the “data wall” and create proprietary training data for the next generation of small cheap LLMs. (And it is those which are served directly to most paying users, with the most expensive tiers reserved for the most valuable purposes, like R&D.) These daydreams serve as an interesting moat against naive data distillation from API transcripts and cheap cloning of frontier models—that kind of distillation works only for things that you know to ask about, but the point here is that you don’t know what to ask about. (And if you did, it wouldn’t be important to use any API, either.)
Given RL scaling laws and rising capital investments, it may be that LLMs will need to become slow & expensive so they can be fast & cheap.
Gwern’s essay you mentioned, in case others are curious: https://gwern.net/ai-daydreaming
I’d also highlight the obstacles and implications sections: