Compression Regime Determines Whether a Neural Network Memorises or Discovers Concepts

The one-sentence version: A contrastive model trained on temporal co-occurrence in 10,000 novels, forced into a compression regime where it cannot memorise specific relationships, discovers transferable structural categories that correspond to narrative function rather than topic and these categories generalise to novels the model has never seen.

The problem of unsupervised concept formation

There’s a long-standing question in cognitive science about how concepts form from experience. The empiricist answer is something like: you see enough cats, and the overlapping features crystallise into a category. The problem is that this only produces similarity-based categories, where things that look alike get grouped together.

But many useful categories are not similarity-based. A chase scene in a Western and a chase scene in a Gothic novel share no surface features. Different vocabulary, different setting, different century. Yet they occupy the same structural position in their respective narratives: tension before, resolution after, tempo change during. They are doing the same thing, even though they are not about the same thing. A category system that only groups by similarity will never put these together. A category system that groups by function will.

The question is whether functional categories can emerge unsupervised from raw sequential experience, without anyone labelling what the functions are.

arXiv | Demo

What I did

I took 9,766 books from Project Gutenberg, split them into 24.96 million short passages and trained a small neural network (a 4-layer MLP, 29.4M parameters) with a contrastive objective: given a passage, learn to distinguish its actual temporal neighbours from random passages in the corpus. The model is trained to predict, given a passage, what its temporal neighbours look like in embedding space.

The critical detail is that the model is far too small to memorise these relationships. Training accuracy plateaus at 42.75%, meaning the model gets the specific prediction wrong more often than it gets it right. It has to compress, to find the recurring patterns that let it do better than chance on a task it cannot solve by brute force.

After training, each passage has a new representation that encodes not what the passage means but what kind of structural neighbourhood it lives in. Clustering these representations at different granularities (k=50 to k=2,000) produces groupings that are qualitatively different from what a standard embedding model produces.

What the model discovered

At k=100, one cluster containing 460,000 passages from 5,088 books captures “direct confrontation and negotiation,” the mode where characters are locked in verbal contest with real stakes. It spans diplomatic fiction, Edwardian drawing-room arguments, science fiction, police interrogations and theological debates, because all of these involve the same underlying dynamic regardless of setting. Another captures “cynical worldly wisdom,” the register in which a narrator or character makes pragmatic observations about human nature, remarkably consistent from Jane Austen through to H.G Wells. A third identifies “lyrical landscape meditation,” the moment when prose slows down to describe a physical scene in language doing emotional rather than informational work. Romantic poetry, Gothic atmospherics, travel writing and pastoral fiction all share this mode, because structurally they’re doing the same thing: using description to modulate the reader’s emotional state.

These are not topics. A standard embedding model (BGE-large-en-v1.5) groups all fear-related passages together. The PAM model groups passages by whether the fear functions as horror atmosphere, comic relief, military tension or romantic suspense, because these have different temporal neighbourhoods: different kinds of text precede and follow them.

At k=1,000–2,000, the model isolates specific registers with an almost anthropological precision: sailor dialect as a vernacular tradition across adventure fiction and naval history, Victorian scientific correspondence conventions, witch trials emerging as a distinct branch of legal procedure and, inevitably, the affectionate-observational cat register (distinct from zoological description of cats, and apparently stable across centuries).

Transfer to unseen novels

Five novels unseen by the model during training, Pride and Prejudice, Dracula, Frankenstein, Alice in Wonderland and War of the Worlds were assigned to existing clusters without retraining.

Alice in Wonderland is the most striking case. The model concentrates 77.6% of the book’s passages into just five clusters, while standard embedding assignment scatters them across nearly all clusters. The Mad Hatter’s tea party and the Queen’s croquet game land in the same cluster, because both are structured as absurdist authority figures presiding over nonsensical social rituals. Different characters, different settings, different vocabularies, same narrative function. The absences matter too: “solitary journey with introspection,” which dominates both Frankenstein and War of the Worlds, accounts for less than 1% of Alice. Alice doesn’t introspect; she reacts. The model picks up on this without being told anything about any of these novels.

Controls

Three controls rule out the obvious alternative explanations.

Temporal shuffle collapses the signal by 95.2%. Randomly reorder passages within books, retrain from scratch and the clusters lose their structural coherence. The model learned genuine sequential structure, not some artifact of the embedding geometry.

A random MLP baseline (untrained network, same architecture) produces clusters that correlate strongly with standard embedding clusters, confirming that the training, not the architecture, produces the structural groupings.

At k=100, clusters draw from an average of 4,508 books. These are corpus-wide regularities, not author quirks or period effects.

Why the compression regime matters: three papers, one training signal

The result above is interesting on its own, but it becomes more interesting in the context of two earlier papers that used the same architecture and the same training signal with very different outcomes.

This is the third paper in a sequence. All three train a contrastive MLP on temporal co-occurrence. The architectures are nearly identical. The outcomes are qualitatively different, and the difference traces to a single variable: how hard the capacity bottleneck squeezes.

Paper 1, PAM (arXiv): A predictor trained on temporal co-occurrence in a synthetic environment achieves Association Precision@1 = 0.970, meaning its top retrieval is a true temporal associate 97% of the time. It recalls associations across representational boundaries where cosine similarity scores zero (Cross-Boundary Recall@20 = 0.421 vs cosine 0.000). Temporal shuffle destroys the signal by 90%, confirming genuine sequential structure. Held-out query states produce zero recall: the model remembers from perspectives it experienced, not novel ones. This is faithful episodic memorisation. Nothing transfers to unseen associations.

Paper 2, AAR (submitted): The same contrastive MLP trained on passage co-occurrence in a question-answering corpus achieves 97% training accuracy and substantially improves retrieval of the second passage needed to answer multi-step questions (the kind where you need fact A to know you should look for fact B). But a model trained on one set of questions and tested on a different set shows no improvement at all. On a harder benchmark with longer reasoning chains, it actively hurts. Again, the associations are corpus-specific and don’t transfer – given AAR was designed to aid in corpus retrieval applications, this was the correct result.

Paper 3, Concept Discovery (this work): 42.75% training accuracy. Transferable structural patterns.

Paper	Training accuracy	Behaviour	Inductive transfer
PAM	~97% (synthetic)	Episodic memorisation	None (by design)
AAR	97% (HotpotQA)	Corpus-specific retrieval	None (by design)
Concept Discovery	42.75% (Gutenberg)	Structural pattern extraction	Works

The hypothesis is that there’s a compression threshold below which a contrastive model transitions from memorising specific associations to extracting recurring structural patterns. Above the threshold, you get a lookup table. Below it, you get concepts.

This connects to information bottleneck theory: the model retains the information about co-occurrence structure that compresses best, which is the structural regularities rather than the individual instances. But it’s an empirical finding, not derived from theory. The threshold likely depends on the ratio of training pairs to model capacity, the structural regularity of the domain and possibly the distribution of co-occurrence frequencies. Characterising the transition itself, understanding where it falls and what controls its sharpness, is future work.

The biological parallel

The biological analogy is hippocampal-neocortical transfer. The hippocampus rapidly encodes specific episodes (PAM’s regime: faithful memorisation of experienced associations). During sleep, hippocampal replay consolidates these episodes into the neocortex, which extracts slow statistical regularities (concept discovery’s regime: multi-epoch training on fixed data, forced compression). The compression bottleneck plays the role of the neocortex’s limited encoding rate.

I want to be precise about the strength of this analogy. The computational mechanism is: replay specific episodes repeatedly through a capacity-limited network until the network extracts the structural patterns that recur across episodes. That is what multi-epoch contrastive training on fixed co-occurrence data does. The mapping is not decorative. The question is whether the mechanism that produces transferable concepts from compressed literary experience is the same mechanism, in the relevant computational sense, that produces transferable concepts from compressed perceptual experience. I don’t know yet. But the ingredients are the same: sequential co-occurrence as training signal, limited capacity as compression force and multi-pass consolidation as the extraction process.

To be even more speculative: the compression bottleneck may be doing the computational work that forgetting does in biological systems. An organism moving through time can’t store every detail of every experience. What it retains is shaped by capacity limits, and those limits force exactly the kind of pattern extraction we observe here. Forgetting isn’t information loss; it’s a compression algorithm that turns episodes into concepts. Multi-epoch training on fixed data, where the model repeatedly fails to memorise individual relationships and is forced to settle for structural regularities, may be a batch-processing analogue of what a capacity-limited biological memory does continuously.

The three papers together suggest that the same learning mechanism, under different compression regimes, produces the functional equivalent of episodic memory (specific, non-transferable) and semantic memory (general, transferable). In cognitive science this is the complementary learning systems framework: the hippocampus does fast specific encoding, the neocortex does slow pattern extraction and the two interact through replay. What we have is a system where the “hippocampal” and “neocortical” regimes are not different architectures but different operating points of the same architecture, separated by a compression parameter. That’s a simpler story than complementary learning systems theory predicts, and it may be wrong at biological scales, but it’s what the data shows across these three papers.

Conceptualisation as a consequence of compression

The result I keep returning to is not any particular cluster but the fact that the transfer works at all. The model was trained on co-occurrence statistics from 9,766 novels. It was never given a concept of “confrontation” or “introspection” or “absurdist social ritual.” It was never optimised for transfer. It was optimised to predict co-occurrence, failed at that task more often than it succeeded and in the process of failing arrived at structural categories that apply to books it has never seen.

That is concept formation from raw experience, and the mechanism is compression. The model can’t remember every individual relationship, so it extracts what recurs. What recurs turns out to be structurally meaningful. And because it was distilled from thousands of independent instances rather than memorised from any particular one, it generalises.

The question this raises, and the one that drives the broader research programme, is whether the same mechanism works beyond text. An embodied agent moving through the world accumulates exactly this kind of sequential co-occurrence data across vision, sound, touch and language simultaneously. If temporal co-occurrence under compression produces transferable functional categories from multimodal perceptual experience, in the same way it produces them from literary experience, then you have a system that forms its own concepts from the structure of the world without human-provided labels. The text corpus is the tractable test case. The real target is grounded multimodal experience.

One important caveat: the concepts the model discovers are bounded by the experience it was trained on. It generalises to unseen novels but those novels are drawn from the same broad literary tradition as the training corpus (English-language texts, predominantly 18th to early 20th century). A contemporary novel that employs narrative structures without precedent in Project Gutenberg, or that deliberately subverts the conventions the model has learned, would be assigned to clusters that don’t quite fit. The model’s concepts are real generalisations but they’re generalisations within its lived experience. It knows what literature has done, not what literature could do. Extending the corpus to include modern fiction, non-Western literary traditions and other text genres is an obvious direction for future work, and would test whether the mechanism discovers new structural categories or forces everything into the patterns it already has.

Cross-domain validation on biological data (protein interaction networks, gene expression) is already in progress and shows the same core signal in a completely different domain, with some interesting divergences around inductive transfer that sharpen the theoretical picture. The research programme is documented at eridos.ai/research.

The demo

The interactive demo renders novels as colour-coded concept timelines. You can scrub through all six resolution levels, click an AI explainer for any cluster and even upload or paste your own text to see how it’s clustered. It’s the fastest way to build intuition for what “narrative function” means as opposed to “topic,” and it’s more convincing than anything I can write here.

The concept discovery landing page is at eridos.ai/concept-discovery. The paper is on arXiv at https://arxiv.org/abs/2603.18420 . Code is on GitHub.