Factories learn, and forget

The spider chart showing exposure by occupational categories in the recent labour market report from Anthropic is striking—manufacturing sits almost at the bottom. The standard explanation is a capability gap—AI systems struggle with physical reasoning, so they haven’t penetrated jobs involving physical processes. This is probably true.

I even remember reading this post last year benchmarking frontier models on a basic machining task, and seeing how little progress has been made since got me thinking about a different problem underneath the capability gap.

It is that factories don’t accumulate learning.

I saw this during my time in production engineering at Toyota. During vehicle launches, my role was to ensure production preparation across the factory was completed on schedule. One of the most consistently tight parts of every launch timeline was the paint shop.

Each new model typically introduced five or six new colour variants. Although colours were defined globally in Japan, the materials used to produce them and the surfaces they were applied to were often sourced locally in each country. This introduced small but meaningful variation.

One recurring issue was colour matching between metal body panels and resin components like bumpers. Even with identical paint formulations, resin parts would sometimes come out with a slightly different shade. The issue was most pronounced in new variants and existing lighter colours. The deviation wasn’t easily visible to the naked eye, but it didn’t meet launch quality standards.

So during every launch, engineers would run multiple rounds of trials, adjusting pigment ratios, modifying process parameters, coordinating with suppliers on material composition and repainting test parts until the colours converged. The process worked, but it often created schedule pressures in the run up to production.

Looking back, I wonder—why were we rediscovering these adjustments every time?

If you look at the paint matching process carefully, it’s a sequence of controlled experiments. A typical loop is run a trial, observe the deviation, adjust a parameter and run another trial. Each cycle produces real information about how specific pigments behave with particular resin compositions, how local supplier materials differ from global specifications and how process conditions affect final colour.

A single launch might involve a few such cycles across multiple colour variants. Multiply that across vehicle programs, plants, and geographies, and the number of experiments becomes significant. But this knowledge almost never became a structured dataset. It lived in static reports, supplier emails, spreadsheets, and the memory of individual engineers.

The learning stayed local, human, and non-cumulative.

This isn’t because engineers were careless or unaware of past experience. The knowledge was genuinely distributed across teams, suppliers, documents, and individual memory, in ways that made it hard to integrate into anything predictive without relying on a specific person who was there.

The loop that actually ran:

problem → experiment → adjustment → production continues

The loop that could’ve run:

experiment → dataset → model → better prediction next time

Manufacturing systems tend to learn episodically rather than cumulatively. Each launch generates real insights. But those insights rarely feed into a system that improves predictions for future launches. Similar problems get partially rediscovered across different programs because the knowledge isn’t structured in a way that’s usable by anyone who wasn’t present.

In many factories, the most sophisticated model of the production process isn’t in any software system, it exists inside the heads of a few experienced engineers. When those engineers move on, much of it goes with them.

The reason this persists isn’t that people don’t see the problem. Most senior engineers know exactly what’s being lost, but the issue is structural.

Each vehicle launch gets a budget to execute that launch. There’s no line item for capturing knowledge that benefits the next launch, because the next launch has its own budget and maybe its own team. The people who’d benefit most from better learning infrastructure have no seat at the table when current project costs are being justified. And the cost of rediscovery never shows up on anyone’s balance sheet, it just gets absorbed into the next launch timeline as normal schedule pressure.

Unless someone with enough seniority and conviction makes it a priority, the incentive to build anything long-term simply isn’t there. And even when someone does, it tends to last as long as they do, unless it is a global directive from Japan.

The paint shop is one instance of a broader pattern. I saw versions of this iterative tuning across the factory in different shops—press, weld and assembly. Each activity generates practical knowledge about how physical systems behave under real conditions. Almost none of it is captured in a form that compounds.

The Anthropic report is careful about this. It acknowledges that low observed exposure might reflect adoption barriers rather than pure capability limits. But I think there’s a more fundamental issue in manufacturing specifically which is, even if AI capability for physical reasoning improved substantially, there isn’t much accumulated data for it to work with.

The two problems compound.

Weak learning infrastructure means fewer structured datasets. Fewer datasets mean weaker models for manufacturing-specific tasks. Weaker models make the capability gap look larger than it might actually be. The gap is real, but it’s partly self-reinforcing.

From a machine learning perspective, factories have been generating training data for decades. The surprising thing isn’t that AI struggles with manufacturing. It’s that the dataset has been producing itself all along, but not captured in a systematic learning infrastructure.

Much of what gets described as tacit manufacturing knowledge may simply be the residue of experiments whose results were never systematically recorded.