Jeremy Gillen comments on Natural Latents: The Math

Jeremy Gillen 6 Jan 2025 14:42 UTC
38 points
14
This post deserves to be remembered as a LessWrong classic.
1. It directly tries to solve a difficult and important cluster of problems (whether it succeeds is yet to be seen).
2. It uses a new diagrammatic method of manipulating sets of independence relations.
3. It’s a technical result! These feel like they’re getting rarer on LessWrong and should be encouraged.
There are several problems that are fundamentally about attaching very different world models together and transferring information from one to the other.
- Ontology identification involves taking a goal defined in an old ontology^[1] and accurately translating it into a new ontology.
- High-level models and low-level models need to interact in a bounded agent. I.e. learning a high-level fact should influence your knowledge about low-level facts and vice versa.
- Value identification is the problem of translating values from a human to an AI. This is much like ontology identification, with the added difficulty that we don’t get as much detailed access or control over the human world model.
- Interpretability is about finding recognisable concepts and algorithms in trained neural networks.
In general, we can solve these problems using shared variables and shared sub-structures that are present in both models.
- We can stitch together very different world models along shared variables. E.g. if you have two models of molecular dynamics, one faster and simpler than the other. You want to simulate in the fast one, then switch to the slow one when particular interactions happen. To transfer the state from one to the other you identify variables present in both models (probably atom locations, velocities, some others), then just copy these values to the other model. Under-specified variables must be inferred from priors.
- If you want to transfer a new concept from WM1 to a less knowledgeable WM2, you can do so by identifying the lower-level concepts that both WMs share, then constructing an “explanation” out of those concepts. An “explanation” would look like a WM fragment purely built out of variables and structures already in WM2.
- An explanation is also a pointer. If you want to point at a very specific concept in someone else’s WM, one way to do so is to explain that concept (in terms of lower level ideas that you are confident are shared).
Natural latents are a step toward solving all of these problems, via describing a subset of variables/structures that we should expect to find across all WMs (and more importantly, some of the conditions required for them to be present).
A natural latent should be extremely useful for any WM that contains variables which share redundant information. I think we can expect this to be common when highly redundant observations are compressed.
For example: If the ~same observation happens more than once, then any learner that generalizes well is going to notice this. It must somehow store the duplicated information (as well as each of the places where it is duplicated). That shared information is a natural latent. The result in this post suggests that this summary information should be isomorphic between agents, under the right conditions.^[2]
As far as I know, it’s an open question which properties of environments&agents imply lots of natural latents.
The current state of this work has some limitations:
- Both learners need access to the same or very similar low level observables X.
- Both learners must have learned the same beliefs, otherwise their latents may be very different (although this is kinda fixed with an additional constraint on the latent).
- John and David seem to have run into difficulties building useful applications of this theory.
- John’s posts aren’t clear on how to identify and separate out the X variables from a general stream of data (although this seems fine for now).
- With lots of compute, approximate models can be dropped in exchange for detailed models. One might drop the concept of “tree” in exchange for a complete categorization of types of trees.
  - On the one hand, this is still tracking the same latent information. The theorems still work. But on the other hand, it isn’t necessarily storing it in an easy-to-access way. This is fine for communication, but less fine for interpretability or manual joining of WMs.
  - Perhaps there is some assumption we can make that guarantees all levels of abstraction will remain stored. Or perhaps we should expect interpretability of a WM to often involve some inferential work on the part of that WM.
If this line of research goes well, I hope that we will have theorems that say something like:
“For some bounded agent design, given observations of some complicated well-understood data-generating structure Z, and enough compute / attentional resources, the agent will learn a model of Z which contains parts x,y,w (each with predicable levels of approximate isomorphism to parts of the real Z). Upon observing more data, we can expect some parts (x,y) of this structure to remain unchanged.”
1. ^
  Think of an ontology the choice of variables in a particular Bayes net, for our current purposes.
2. ^
  I’m leaning on the algorithmic definition of natural latents here.