# Testing The Natural Abstraction Hypothesis: Project Update

I set myself six months to focus primarily on the Natural Abstraction Hypothesis project before stopping to re-evaluate. It’s been about six months since then. So, how has it gone?

This will be a “story telling” post, where I talk more about my research process and reasoning than about the results themselves. Be warned: this means I’m going to spout some technical stuff without explanation here and there, and in some cases I haven’t even written a good explanation yet—this is a picture of my own thoughts. For more background on the results, the three main posts are:

The intro post for the overarching project, which I recommend reading.

Information At A Distance Is Mediated By Deterministic Constraints, which I also recommend reading.

Generalizing Koopman-Pitman-Darmois, which I do not recommend reading unless you want dense math.

## Recap: The Original Plan

The Project Intro broke the Natural Abstraction Hypothesis into three sub-hypotheses:

Abstractability: for most physical systems, the information relevant “far away” can be represented by a summary much lower-dimensional than the system itself.

Human-Compatibility: These summaries are the abstractions used by humans in day-to-day thought/language.

Convergence: a wide variety of cognitive architectures learn and use approximately-the-same summaries.

That post suggested three types of experiments to test these:

Abstractability: does reality abstract well? Corresponding experiment type: run a reasonably-detailed low-level simulation of something realistic; see if info-at-a-distance is low-dimensional.

Human-Compatibility: do these match human abstractions? Corresponding experiment type: run a reasonably-detailed low-level simulation of something realistic; see if info-at-a-distance recovers human-recognizable abstractions.

Convergence: are these abstractions learned/used by a wide variety of cognitive architectures? Corresponding experiment type: train a predictor/agent against a simulated environment with known abstractions; look for a learned abstract model.

Alas, in order to run these sorts of experiments, we first need to solve some tough algorithmic problems. Computing information-at-a-distance in reasonably-complex simulated environments is a necessary step for all of these, and the “naive” brute-force method for this is very-not-tractable. It requires evaluating high-dimensional integrals over “noise” variables—a #P-complete problem in general. (#P-complete is sort of like NP-complete, but Harder.) Even just representing abstractions efficiently is hard—we’re talking about e.g. the state-distribution of a bunch of little patches of wood in some chunk of a chair given the state-distribution of some other little patches of wood in some other chunk of the chair. Explicitly writing out that whole distribution would take an amount of space exponential in the number of variables involved; that would be a data structure of size roughly O((# of states for a patch of wood)^(# of patches)).

My main goal for the past 6 months was to develop tools to make the experiments tractable—i.e. theorems, algorithms, working code, and proofs-of-concept to solve the efficiency problems.

When this 6 month subproject started out, I had a working proof-of-concept for linear systems. I was hoping that I could push that to somewhat more complex systems via linear approximations, figure out some useful principles empirically, and generally get a nice engineering-experiment-theory feedback loop going. That’s the fast way to make progress.

## … Turns Out Chaos Is Not Linear

The whole “start with linear approximations and get a nice engineering-experiment-theory feedback loop going” plan ran straight into a brick wall. Not entirely surprising, but it happened sooner than I expected.

Chaos was the heart of the issue. If a butterfly can change the course of a hurricane by flapping its wings, then our uncertainty over the wing-flaps of all the world’s butterflies wipes out most of our long-term information about hurricane-trajectories. I believe this sort of phenomenon plays a central role in abstraction in practice: the “natural abstraction” is a summary of exactly the information which *isn’t* wiped out. So, my methods definitely needed to handle chaos. I knew that computing abstractions in linear systems was tractable, and expected to be able to extend that to at least *some* limited chaotic systems via local linear approximation. I figured something like a Lyapunov exponent could be calculated locally and used to deduce abstractions for some reasonable class of chaotic systems; the hope was that empirical investigation of those systems would be enough to get a foothold on more complex systems.

Alas, I did not understand just how central *nonlocality* is to chaos: we cannot tell what information chaos wipes out just by looking at small regions, and therefore we cannot tell what information chaos wipes out just by looking at linear approximations.

Hand-wavy intuition for this: one defining feature of chaos is that a system’s state-trajectory eventually returns arbitrarily close to its starting point (though it never *exactly* returns, or the system would be cyclic rather than chaotic). So, picture something like this:

A local linear approximation looks at the trajectories within a small box, like this:

But it’s the behavior outside this small box—i.e. in the big loop—which makes the trajectory return arbitrarily close to its starting point:

In particular, that means **we can’t tell whether the system is chaotic just by looking at the small region**. Chaos is inherently nonlocal—we can only recognize it by looking at large-scale properties/behavior, not just a small box.

This, in turn, means that we can’t tell what information will be “wiped out” by chaos just by looking at a small box. The whole linear approximation approach is a nonstarter.

(Note: we can say some useful things by looking at the system locally, e.g. about how quickly trajectories diverge within the region. But not the things I need for calculating chaos-induced abstractions.)

## Back To The Drawing Board

With the linear approximation path dead, I no longer had an immediate, promising foothold for experiment or engineering. My dreams of a fast engineering-experiment-theory feedback loop were put on hold, and it was back to glacially slow theorizing.

The basic problem is **how to represent abstractions**. In general, we’re talking about probability distributions of some stuff given some other stuff “far away”. All of the stuff involved is fairly high-dimensional, so explicitly representing those distributions would require exponentially large amounts of space (like the chair example from earlier). And abstraction is inherently about large high-dimensional systems, so focussing specifically on small systems doesn’t really help.

On the other hand, presumably there *exist* more efficient data structures for abstractions—after all, the human brain does not have exponentially large amounts of space for representing all the abstractions we use in day-to-day life.

Since chaos was an obvious barrier, I went looking for generalizations of the mathematical tools we already use to represent abstractions in chaotic systems in practice—specifically the tools of statistical mechanics. The two big pieces there are:

Conserved quantities

Exponential-family (aka maximum entropy) distributions

Progress was much slower than I’d like, but I did end up with two remarkably powerful tools.

## Deterministic Constraints (a.k.a. Conserved Quantities) and the Telephone Theorem

My most exciting result of the last six months is definitely Deterministic Constraints Mediate Information At A Distance—a.k.a The Telephone Theorem. In its simplest form, it says that information-at-a-distance is like the game Telephone: all information is either perfectly conserved or completely lost in the long run. And, more interestingly, information can only be perfectly conserved when it is carried by deterministic constraints—i.e. quantities which are exactly equal between two parts of the system.

The original intuition behind this result comes from chaos: in (deterministic) chaotic systems, anything which is not a conserved quantity behaves “randomly” over time. (Here “randomly” means that the large-scale behavior becomes dependent on arbitrarily low-order bits of the initial conditions.) Intuitively: any information which is not perfectly conserved is lost. I wanted to generalize that to nondeterministic systems and make it more explicitly about “information” in the more precise sense used in information theory. I did a little math, and found that information is perfectly conserved only when it’s carried by deterministic constraints.

… and then I decided this was clearly a dead end. I was looking for results applicable to probabilistic systems, and this one apparently only applied to deterministic relationships. So I abandoned that line of inquiry.

Two and a half months later, I was laying on the couch with a notebook, staring at a diagram of nested Markov blankets, thinking that surely there must be something nontrivial to say about those damn Markov blankets. (This was not the first time I had that thought—many hours and days were spent ruminating on variations of that diagram.) It occurred to me that mutual information decreases as we move out through the layers, and therefore MI approaches a limit—at which point it *stops* decreasing (or at least decreases arbitrarily slowly). Which is an information conservation condition. Indeed, it was exactly the same information conservation condition I had given up on two and a half months earlier.

Why am I excited about the Telephone Theorem? First and foremost: **finding deterministic constraints does not involve computing any high-dimensional integrals**. It just involves equation-solving/optimization—not exactly easy, in general, but much more tractable than integrals! It also yields a natural data structure: if our constraint is , then the functions and can represent the constraint. These are essentially “features” in our models; they summarize all the info from one chunk of variables relevant to another chunk far away. Such features are typically much more efficient to work with than full distributions.

Finally, we already know that deterministic constraints work great for characterizing distributions in chaotic systems—that’s exactly how Gibbs’ various ensembles work, and the empirical success of this approach in statistical mechanics speaks for itself. However, this approach is currently only used in statistical mechanics for “information far away” along the “time direction” (i.e. thermodynamic equilibrium approached over time); the Telephone Theorem generalizes the idea to arbitrary “directions”.

Major open questions here (you don’t need to follow all of these):

Do deterministic constraints in the limit have a common form—in particular infinite averages?

Is there a common form for the abstraction-distributions given the “features” from deterministic constraints? In particular, I suspect that the deterministic constraints yield the feature-functions in exponential-family distributions.

There’s an awful lot of possible sequences of Markov blankets, and therefore an awful lot of “features” relevant to things-far-away in different “directions”. Can that be compressed somehow? The Current Directions section below has a hypothesized general form which would handle this.

## Exponential Family Distributions and the Koopman-Pitman-Darmois Theorem

During the two-and-a-half month gap in which the deterministic constraints result was sitting there waiting for the final puzzle piece to click into place, I worked mainly on the exponential family angle, specifically generalizing the Koopman-Pitman-Darmois Theorem.

*Very* roughly speaking:

The Natural Abstraction Hypothesis says that far-apart chunks of the world are conditionally independent given some low-dimensional summaries.

The Koopman-Pitman-Darmois Theorem says that if a bunch of variables are conditionally independent given some low-dimensional summaries, then those variables follow an exponential-family distribution—a family which includes things like the normal distribution, uniform distribution, poisson distribution, exponential distribution… basically most of the nice distributions you’d find in a statistical programming library.

Obvious hypothesis: the Natural Abstraction Hypothesis implies that far-apart chunks of the world follow an exponential-family distribution. Like the Telephone Theorem, this would dramatically narrow down the possible distributions we need to represent, and suggests a natural data structure: functions representing the “features” in the distribution. Also like the Telephone Theorem, those functions are typically much more efficient to work with algorithmically than full distributions.

This exponential-family hypothesis also matches up nicely with empirical evidence: exponential family distributions (sometimes called “maximum entropy distributions”) are ubiquitous in statistical mechanics, and work great in practice for modelling exactly the sort of chaotic systems which I consider central examples of abstraction.

Unfortunately, the original Koopman-Pitman-Darmois theorem is too narrow to properly back up this hypothesis. And without knowing how the theorem generalizes, I wasn’t sure of exactly the right way to apply it—i.e. exactly what exponential family distributions to look for. So, I spent a couple months understanding the proof enough to generalize it, and writing up the result. Nothing too exciting, just fairly tedious mathematical legwork.

Even now, I’m still not fully sure of the right way to apply the generalized Koopman-Pitman-Darmois (gKPD) Theorem to abstractions; there’s more than one way to map the theorem’s variables to things in the “information-at-a-distance” picture. That said, combining gKPD with the Telephone Theorem gives a strong hint: **the “features” in our exponential-family distribution should probably be the deterministic constraint functions from the Telephone Theorem**. This is exactly what happens in statistical mechanics—again, Gibbs’ various ensembles are exponential-family distributions in which the features are deterministically-conserved quantities of the system (like energy, momentum or particle count).

## Current Directions

My current best guess is that abstractions on a low-level world all follow roughly the general form

… probably modulo a bounded, relatively-small number of exception terms. Notes on what this means:

The deterministic constraint functions between various Markov blankets are sub-sums of .

The individual each depend only on local subsets of the low-level variables (so they don’t actually each depend on all of ).

is a fixed “reference value” of the high-level variables, so is the low-level world distribution under some particular reference values of the high-level world variables.

All information-at-a-distance is zero in the reference distribution . So, the exponential term mediates all long-range interactions.

should probably be such that is a normal distribution. This is a guess based on what makes everything behave nicely.

I have rough mathematical arguments to support this via gKPD if we make some extra assumptions, but no general proof yet. It is *tantalizingly close* to algorithmic tractability, i.e. something I could code up and test empirically in reasonably-large simulations.

Next steps:

Characterize the possible forms of deterministic constraints. If (as I expect) infinite averages are the only non-finite possibilities, then the key question is

*averages of what?*That, in turn, will tell us what the functions are in the hypothesized general form above, and how to find them.Figure out some proofs for the general form—in particular, what the exception terms look like. (Or, alternatively, disprove the general form.)

Work out efficient algorithms to discover and the corresponding from the low-level world model (represented as a Bayes Net/causal model). Then, test it all empirically.

It feels like I’m now very close to the first big milestone toward testing the Natural Abstraction Hypothesis: a program which can take in a low-level simulation of some system, and spit out the natural abstractions in that system.

- The Plan by 10 Dec 2021 23:41 UTC; 239 points) (
- Principles for Alignment/Agency Projects by 7 Jul 2022 2:07 UTC; 120 points) (
- Research agenda: Formalizing abstractions of computations by 2 Feb 2023 4:29 UTC; 81 points) (
- Voting Results for the 2021 Review by 1 Feb 2023 8:02 UTC; 66 points) (
- [Appendix] Natural Abstractions: Key Claims, Theorems, and Critiques by 16 Mar 2023 16:38 UTC; 46 points) (
- Mosaic and Palimpsests: Two Shapes of Research by 12 Jul 2022 9:05 UTC; 39 points) (
- Consider trying Vivek Hebbar’s alignment exercises by 24 Oct 2022 19:46 UTC; 38 points) (
- The Natural Abstraction Hypothesis: Implications and Evidence by 14 Dec 2021 23:14 UTC; 33 points) (
- Maxent and Abstractions: Current Best Arguments by 18 May 2022 19:54 UTC; 33 points) (
- Consider trying Vivek Hebbar’s alignment exercises by 24 Oct 2022 19:46 UTC; 16 points) (EA Forum;
- 16 Jan 2023 7:49 UTC; 4 points) 's comment on Testing The Natural Abstraction Hypothesis: Project Intro by (
- 13 Jan 2023 1:28 UTC; 3 points) 's comment on The AI Control Problem in a wider intellectual context by (

A good review of work done, which shows that the writer is following their research plan and following up their pledge to keep the community informed.

The contents, however, are less relevant, and I expect that they will change as the project goes on. I.e. I think it is a great positive that this post exists, but it may not be worth reading for most people, unless they are specifically interested in research in this area. They should wait for the final report, be it positive or negative.