# Public Static: What is Abstraction?

Author’s Note: Most of the posts in this se­quence are es­sen­tially a log of work-in-progress. This post is in­tended as a more pre­sentable (“pub­lic”) and higher-con­fi­dence (“static”) write-up of some for­mal­iza­tions of ab­strac­tion. Much of the ma­te­rial has ap­peared in other posts; the first two sec­tions in par­tic­u­lar are drawn al­most ver­ba­tim from the open­ing “What is Ab­strac­tion?” post.

• We have a gas con­sist­ing of some huge num­ber of par­ti­cles. We throw away in­for­ma­tion about the par­ti­cles them­selves, in­stead keep­ing just a few sum­mary statis­tics: av­er­age en­ergy, num­ber of par­ti­cles, etc. We can then make highly pre­cise pre­dic­tions about things like e.g. pres­sure just based on the re­duced in­for­ma­tion we’ve kept, with­out hav­ing to think about each in­di­vi­d­ual par­ti­cle. That re­duced in­for­ma­tion is the “ab­stract layer”—the gas and its prop­er­ties.

• We have a bunch of tran­sis­tors and wires on a chip. We ar­range them to perform some log­i­cal op­er­a­tion, like maybe a NAND gate. Then, we throw away in­for­ma­tion about the un­der­ly­ing de­tails, and just treat it as an ab­stract log­i­cal NAND gate. Us­ing just the ab­stract layer, we can make pre­dic­tions about what out­puts will re­sult from what in­puts. Note that there’s some fuzzi­ness − 0.01 V and 0.02 V are both treated as log­i­cal zero, and in rare cases there will be enough noise in the wires to get an in­cor­rect out­put.

• I tell my friend that I’m go­ing to play ten­nis. I have ig­nored a huge amount of in­for­ma­tion about the de­tails of the ac­tivity—where, when, what racket, what ball, with whom, all the dis­tri­bu­tions of ev­ery micro­scopic par­ti­cle in­volved—yet my friend can still make some re­li­able pre­dic­tions based on the ab­stract in­for­ma­tion I’ve pro­vided.

• When we ab­stract for­mu­las like “1+1=2*1” and “2+2=2*2″ into “n+n=2*n”, we’re ob­vi­ously throw­ing out in­for­ma­tion about the value of n, while still mak­ing what­ever pre­dic­tions we can given the in­for­ma­tion we kept. This is what ab­strac­tion is all about in math and pro­gram­ming: throw out as much in­for­ma­tion as you can, while still main­tain­ing the core “pre­dic­tion”—i.e. the the­o­rem or al­gorithm.

• I have a street map of New York City. The map throws out lots of info about the phys­i­cal streets: street width, pot­holes, power lines and wa­ter mains, build­ing fa­cades, signs and stoplights, etc. But for many ques­tions about dis­tance or reach­a­bil­ity on the phys­i­cal city streets, I can trans­late the ques­tion into a query on the map. My query on the map will re­turn re­li­able pre­dic­tions about the phys­i­cal streets, even though the map has thrown out lots of info.

The gen­eral pat­tern: there’s some ground-level “con­crete” model (or ter­ri­tory), and an ab­stract model (or map). The ab­stract model throws away or ig­nores in­for­ma­tion from the con­crete model, but in such a way that we can still make re­li­able pre­dic­tions about some as­pects of the un­der­ly­ing sys­tem.

No­tice that the pre­dic­tions of the ab­stract mod­els, in most of these ex­am­ples, are not perfectly ac­cu­rate. We’re not deal­ing with the sort of “ab­strac­tion” we see in e.g. pro­gram­ming or alge­bra, where ev­ery­thing is ex­act. There are go­ing to be prob­a­bil­ities in­volved.

In the lan­guage of em­bed­ded world-mod­els, we’re talk­ing about multi-level mod­els: mod­els which con­tain both a no­tion of “table”, and of all the pieces from which the table is built, and of all the atoms from which the pieces are built. We want to be able to use pre­dic­tions from one level at other lev­els (e.g. pre­dict bulk ma­te­rial prop­er­ties from micro­scopic struc­ture and/​or macro­scopic mea­sure­ments, or pre­dict from ma­te­rial prop­er­ties whether it’s safe to sit on the table), and we want to move be­tween lev­els con­sis­tently.

## For­mal­iza­tion: Start­ing Point

To re­peat the in­tu­itive idea: an ab­stract model throws away or ig­nores in­for­ma­tion from the con­crete model, but in such a way that we can still make re­li­able pre­dic­tions about some as­pects of the un­der­ly­ing sys­tem.

So to for­mal­ize ab­strac­tion, we first need some way to spec­ify which “as­pects of the un­der­ly­ing sys­tem” we wish to pre­dict, and what form the pre­dic­tions take. The ob­vi­ous start­ing point for pre­dic­tions is prob­a­bil­ity dis­tri­bu­tions. Given that our pre­dic­tions are prob­a­bil­ity dis­tri­bu­tions, the nat­u­ral way to spec­ify which as­pects of the sys­tem we care about is via a set of events or logic state­ments for which we calcu­late prob­a­bil­ities. We’ll be ag­nos­tic about the ex­act types for now, and just call these “queries”.

That leads to a rough con­struc­tion. We start with some low-level model and a set of queries . From these, we con­struct a min­i­mal high-level model by keep­ing ex­actly the in­for­ma­tion rele­vant to the queries, and throw­ing away all other in­for­ma­tion. By the min­i­mal map the­o­rems, we can rep­re­sent di­rectly by the full set of prob­a­bil­ities ; and con­tain ex­actly the same in­for­ma­tion. Of course, in prac­ti­cal ex­am­ples, the prob­a­bil­ities will usu­ally have some more com­pact rep­re­sen­ta­tion, and will usu­ally con­tain some ex­tra­ne­ous in­for­ma­tion as well.

To illus­trate a bit, let’s iden­tify the low-level model, class of queries, and high-level model for a few of the ex­am­ples from ear­lier.

• Ideal Gas:

• Low-level model is the full set of molecules, their in­ter­ac­tion forces, and a dis­tri­bu­tion rep­re­sent­ing our knowl­edge about their ini­tial con­figu­ra­tion.

• Class of queries con­sists of com­bi­na­tions of macro­scopic mea­sure­ments, e.g. one query might be “pres­sure = 12 torr & vol­ume = 1 m^3 & tem­per­a­ture = 110 K”.

• For an ideal gas, the high-level model can be rep­re­sented by e.g. tem­per­a­ture, num­ber of par­ti­cles (of each type if the gas is mixed), and con­tainer vol­ume. Given these val­ues and as­sum­ing a near-equil­ibrium ini­tial con­figu­ra­tion dis­tri­bu­tion, we can pre­dict the other macro­scopic mea­surables in the queries (e.g. pres­sure).

• Ten­nis:

• Low-level model is the full micro­scopic con­figu­ra­tion of me and the phys­i­cal world around me as I play ten­nis (or what­ever else I do).

• Class of queries is hard to sharply define at this point, but in­cludes things like “John will an­swer his cell phone in the next hour”, “John will hold a racket and hit a fuzzy ball in the next hour”, “John will play Civ for the next hour”, etc—all the things whose prob­a­bil­ities change on hear­ing that I’m go­ing to play ten­nis.

• High-level model is just the sen­tence “I am go­ing to play ten­nis”.

• Street Map:

• Low-level model is the phys­i­cal city streets

• Class of queries in­cludes things like “short­est path from Times Square to Cen­tral Park starts by fol­low­ing Broad­way”, “dis­tance be­tween the Met and the Hud­son is less than 1 mile”, etc—all the things we can de­duce from a street map.

• High-level model is the map. Note that the phys­i­cal map also in­cludes some ex­tra­ne­ous in­for­ma­tion, e.g. the po­si­tions of all the in­di­vi­d­ual atoms in the piece of pa­per/​smart­phone.

Already with the sec­ond two ex­am­ples there seems to be some “cheat­ing” go­ing on in the model defi­ni­tion: we just define the query class as all the events/​logic state­ments whose prob­a­bil­ities change based on the in­for­ma­tion in the map. But if we can do that, then any­thing can be a “high-level map” of any “low-level ter­ri­tory”, with the queries taken to be the events/​state­ments about the ter­ri­tory which the map ac­tu­ally has some in­for­ma­tion about—not a very use­ful defi­ni­tion!

## In­for­ma­tion About Things “Far Away”

In or­der for ab­strac­tion to ac­tu­ally be use­ful, we need some effi­cient way to know which queries the ab­stract model can ac­cu­rately an­swer, with­out hav­ing to di­rectly eval­u­ate each query within the low-level model.

In prac­tice, we usu­ally seem to have a no­tion of which vari­ables are “far apart”, in the sense that any in­ter­ac­tions be­tween the two are me­di­ated by many in-be­tween vari­ables.

The me­di­at­ing vari­ables are noisy, so they wipe out most of the “fine-grained” in­for­ma­tion pre­sent in the vari­ables of in­ter­est. We can there­fore ig­nore that fine-grained in­for­ma­tion when mak­ing pre­dic­tions about things far away. We just keep around what­ever high-level sig­nal makes it past the noise of me­di­at­ing vari­ables, and throw out ev­ery­thing else, so long as we’re only ask­ing ques­tions about far-away vari­ables.

An ex­am­ple: when I type “4+3” in a python shell, I think of that as adding two num­bers, not as a bunch of con­tin­u­ous voltages driv­ing elec­tric fields and cur­rent flows in lit­tle patches of metal and doped sili­con. Why? Be­cause, if I’m think­ing about what will show up on my mon­i­tor af­ter I type “4+3” and hit en­ter, then the ex­act voltages and cur­rent flows on the CPU are not rele­vant. This re­mains true even if I’m think­ing about the voltages driv­ing in­di­vi­d­ual pix­els in my mon­i­tor—even at a fairly low level, the ex­act voltages in the ar­ith­metic-logic unit on the CPU aren’t rele­vant to any­thing more than a few microns away—ex­cept for the high-level in­for­ma­tion con­tained in the “num­bers” passed in and out. In­for­ma­tion about ex­act voltages in spe­cific wires is quickly wiped out by noise within the chip.

Another ex­am­ple: if I’m an as­tronomer pre­dict­ing the tra­jec­tory of the sun, then I’m pre­sum­ably go­ing to treat other stars as point-masses. At such long dis­tances, the ex­act mass dis­tri­bu­tion within the star doesn’t re­ally mat­ter—ex­cept for the high-level in­for­ma­tion con­tained in the to­tal mass, mo­men­tum and cen­ter-of-mass lo­ca­tion.

For­mal­iz­ing this in the same lan­guage as the pre­vi­ous sec­tion:

• We have some vari­ables and in the low-level model.

• In­ter­ac­tions be­tween and are me­di­ated by noisy vari­ables .

• Noise in wipes out most fine-grained in­for­ma­tion about , so only the high-level sum­mary is rele­vant to .

Math­e­mat­i­cally: for any which is “not too close” to - i.e. any which do not over­lap with (or with it­self). Our high-level model re­places with , and our set of valid queries is the whole joint dis­tri­bu­tion of given .

Now that we have two defi­ni­tions, it’s time to start the Venn di­a­gram of defi­ni­tions of ab­strac­tion.

So far, we have:

• A high-level model throws out in­for­ma­tion from a low-level model in such a way that some set of queries can still be an­swered cor­rectly: .

• A high-level model throws out in­for­ma­tion from some vari­able in such a way that all in­for­ma­tion about “far away” vari­ables is kept: .

## Sys­tems View

The defi­ni­tion in the pre­vi­ous sec­tion just fo­cuses on ab­stract­ing a sin­gle vari­able . In prac­tice, we of­ten want to take a sys­tem-level view, ab­stract­ing a whole bunch of low-level vari­ables (or sets of low-level vari­ables) all at once. This doesn’t in­volve chang­ing the pre­vi­ous defi­ni­tion, just ap­ply­ing it to many vari­ables in par­allel.

Rather than just one vari­able of in­ter­est , we have many low-level vari­ables (or non-over­lap­ping sets of vari­ables) and their high-level sum­maries . For each of the , we have some set of vari­ables “nearby” , which me­di­ate its in­ter­ac­tions with ev­ery­thing else. Our “far-away” vari­ables Y are now any far-away ’s, so we want

for any sets of in­dices and which are “far apart”—mean­ing that does not over­lap any or .

(No­ta­tion: I will use lower-case in­dices like for in­di­vi­d­ual vari­ables, and up­per-case in­dices like to rep­re­sent sets of vari­ables. I will also treat any sin­gle in­dex in­ter­change­ably with the set con­tain­ing just that in­dex.)

For in­stance, if we’re think­ing about wires and tran­sis­tors on a CPU, we might look at sep­a­rate chunks of cir­cuitry. Voltages in each chunk of cir­cuitry are , and sum­ma­rizes the bi­nary voltage val­ues. are voltages in any com­po­nents phys­i­cally close to chunk on the chip. Any­thing phys­i­cally far away on the chip will de­pend only on the bi­nary voltage val­ues in the com­po­nents, not on the ex­act voltages.

The main up­shot of all this is that we can rewrite the math in a cleaner way: as a (par­tial) fac­tor­iza­tion. Each of the low-level com­po­nents are con­di­tion­ally in­de­pen­dent given the high-level sum­maries, so:

This con­di­tion only needs to hold when picks out in­dices such that (i.e. we pick out a sub­set of the ’s such that no two are “close to­gether”). Note that we can pick any set of in­dices which satis­fies this con­di­tion—so we re­ally have a whole fam­ily of fac­tor­iza­tions of marginal dis­tri­bu­tions in which no two vari­ables are “close to­gether”. See the ap­pendix to this post for a proof of the for­mula.

In English: any set of low-level vari­ables which are all “far apart” are in­de­pen­dent given their high-level sum­maries . In­tu­itively, the pic­ture looks like this:

We pick some set of low-level vari­ables which are all far apart, and com­pute their sum­maries . By con­struc­tion, we have a model in which each of the high-level vari­ables is a leaf in the graph­i­cal model, de­ter­mined only by the cor­re­spond­ing low-level vari­ables. But thanks to the ab­strac­tion con­di­tion, we can in­de­pen­dently swap any sub­set of the sum­maries with their cor­re­spond­ing low-level vari­ables—as­sum­ing that all of them are “far apart”.

Re­turn­ing to the digi­tal cir­cuit ex­am­ple: if we pick any sub­set of the wires and tran­sis­tors on a chip, such that no two are too phys­i­cally close to­gether, then we ex­pect that their ex­act voltages are roughly in­de­pen­dent given the high-level sum­mary of their digi­tal val­ues.

We’ll add this to our Venn di­a­gram as an equiv­a­lent for­mu­la­tion of the pre­vi­ous defi­ni­tion.

I have found this for­mu­la­tion to be the most use­ful start­ing point in most of my own think­ing, and it will be the jump­ing-off point for our last two no­tions of ab­strac­tion in the next two sec­tions.

## Causality

So far we’ve only talked about “queries” on the joint dis­tri­bu­tion of vari­ables. Another nat­u­ral step is to in­tro­duce causal struc­ture into the low-level model, and re­quire in­ter­ven­tional queries to hold on far apart vari­ables.

There are some de­grees of free­dom in which in­ter­ven­tional queries hold on far apart vari­ables. One ob­vi­ous an­swer is “all of them”:

… with the same con­di­tions on as be­fore, plus the added con­di­tion that the in­dices in and also be far apart. This is the usual re­quire­ment in math/​pro­gram­ming ab­strac­tion, but it’s too strong for many real-world ap­pli­ca­tions. For in­stance, when think­ing about fluid dy­nam­ics, we don’t ex­pect our ab­strac­tions to hold when all the molecules in a par­tic­u­lar cell of space are pushed into the cor­ner of that cell. In­stead, we could weaken the low-level in­ter­ven­tion to sam­ple from low-level states com­pat­i­ble with the high-level in­ter­ven­tion:

We could even have low-level in­ter­ven­tions sam­ple from some en­tirely differ­ent dis­tri­bu­tion, to re­flect e.g. a phys­i­cal ma­chine used to perform the in­ter­ven­tions.

Another post will talk more about this, but it turns out that we can say quite a bit about causal ab­strac­tion while re­main­ing ag­nos­tic to the de­tails of the low-level in­ter­ven­tions. Any of the above in­ter­ven­tional query re­quire­ments have qual­i­ta­tively-similar im­pli­ca­tions, though ob­vi­ously some are stronger than oth­ers.

In day-to-day life, causal ab­strac­tion is ar­guably more com­mon than non-causal. In fully de­ter­minis­tic prob­lems, val­idity of in­ter­ven­tional queries is es­sen­tially the only con­straint (though of­ten in guises which do not ex­plic­itly men­tion causal­ity, e.g. func­tional be­hav­ior or logic). For in­stance, sup­pose I want to write a python func­tion to sort a list. The only con­straint is the ab­stract in­put/​out­put be­hav­ior, i.e. the be­hav­ior of the des­ig­nated “out­put” un­der in­ter­ven­tions on the des­ig­nated “in­puts”. The low-level de­tails—i.e. the ac­tual steps performed by the al­gorithm—are free to vary, so long as those high-level in­ter­ven­tional con­straints are satis­fied.

This gen­er­al­izes to other de­sign/​en­g­ineer­ing prob­lems: the de­sired be­hav­ior of a sys­tem is usu­ally some ab­stract, high-level be­hav­ior un­der in­ter­ven­tions. Low-level de­tails are free to vary so long as the high-level con­straints are satis­fied.

## Ex­act Abstraction

Fi­nally, one im­por­tant spe­cial case. In math and pro­gram­ming, we typ­i­cally use ab­strac­tions with sharper bound­aries than most of those dis­cussed here so far. Pro­to­typ­i­cal ex­am­ples:

• A func­tion in pro­gram­ming: be­hav­ior of ev­ery­thing out­side the func­tion is in­de­pen­dent of the func­tion’s in­ter­nal vari­ables, given a high-level sum­mary con­tain­ing only the func­tion’s in­puts and out­puts. Same for pri­vate vari­ables/​meth­ods of a class.

• Ab­stract alge­bra: many prop­er­ties of math­e­mat­i­cal ob­jects hold in­de­pen­dent of the in­ter­nal de­tails of the ob­ject, given cer­tain high-level sum­mary prop­er­ties—e.g. the group ax­ioms, or the ring ax­ioms, or …

• In­ter­faces for ab­stract data struc­tures: the in­ter­nal or­ga­ni­za­tion of the data struc­ture is ir­rele­vant to ex­ter­nal users, given the ab­stract “in­ter­face”—a high-level sum­mary of the ob­ject’s be­hav­ior un­der differ­ent in­puts (a.k.a. differ­ent in­ter­ven­tions).

In these cases, there’s no noisy in­ter­me­di­ate vari­ables, and no no­tion of “far away” vari­ables. There’s just a hard bound­ary: the in­ter­nal de­tails of high-level ab­stract ob­jects do not in­ter­act with things of in­ter­est “out­side” the ob­ject ex­cept via the high-level sum­maries.

We can eas­ily cast this as a spe­cial case of our ear­lier no­tion of ab­strac­tion: the set of noisy in­ter­me­di­ate vari­ables is empty. The “high-level sum­mary” of the low-level vari­ables con­tains all in­for­ma­tion rele­vant to any vari­ables out­side of them­selves.

Of course, ex­act ab­strac­tion over­laps quite a bit with causal ab­strac­tion. Ex­act ab­strac­tions in math/​pro­gram­ming are typ­i­cally de­ter­minis­tic, so they’re mainly con­strained by in­ter­ven­tional pre­dic­tions rather than dis­tri­bu­tional pre­dic­tions.

## Summary

We started with a very gen­eral no­tion of ab­strac­tion: we take some low-level model and ab­stract it into a high-level model by throw­ing away in­for­ma­tion in such a way that we can still ac­cu­rately an­swer some queries. This is ex­tremely gen­eral, but in or­der to ac­tu­ally be use­ful, we need some effi­cient way to know which queries are and are not sup­ported by the ab­strac­tion.

That brought us to our next defi­ni­tion: ab­strac­tion keeps in­for­ma­tion rele­vant to “far away” vari­ables. We imag­ine that in­ter­ac­tions be­tween the vari­able-to-be-ab­stracted and things far away are me­di­ated by some noisy “nearby” vari­ables , which wipe out most of the in­for­ma­tion in . So, we can sup­port all queries on things far away by keep­ing only a rel­a­tively small sum­mary .

Ap­ply­ing this defi­ni­tion to a whole sys­tem, rather than just one vari­able, we find a clean for­mu­la­tion: all sets of far-apart low-level vari­ables are in­de­pen­dent given the cor­re­spond­ing high-level sum­maries.

Next, we ex­tended this to causal ab­strac­tion by re­quiring that in­ter­ven­tional queries also be sup­ported.

Fi­nally, we briefly men­tioned the spe­cial case in which there are no noisy in­ter­me­di­ate vari­ables, so the ab­strac­tion bound­ary is sharp: there’s just the vari­ables to be ab­stracted, and ev­ery­thing out­side of them. This is the usual no­tion of ab­strac­tion in math and pro­gram­ming.

## Ap­pendix: Sys­tem For­mu­la­tion Proof

We start with two pieces. By con­struc­tion, is calcu­lated en­tirely from , so

(con­struc­tion)

… with­out any re­stric­tion on which sub­sets of the vari­ables we look at. Then we also have the ac­tual ab­strac­tion condition

(ab­strac­tion)

… as long as does not over­lap or .

We want to show that

… for any set of non-nearby vari­ables (i.e. ). In English: sets of far-apart low-level vari­ables are in­de­pen­dent given their high-level coun­ter­parts.

Let’s start with defi­ni­tions of “far-apart” and “nearby”, so we don’t have to write them out ev­ery time:

• Two sets of in­dices and are “far apart” if and do not over­lap , and vice-versa. In­di­vi­d­ual in­dices can be treated as sets con­tain­ing one el­e­ment for pur­poses of this defi­ni­tion—so e.g. two in­dices or an in­dex and a set of in­dices could be “far apart”.

• Indices and/​or sets of in­dices are “nearby” if they are not far apart.

As be­fore, I will use cap­i­tal let­ters for sets of in­dices and lower-case let­ters for in­di­vi­d­ual in­dices, and I won’t dis­t­in­guish be­tween a sin­gle in­dex and the set con­tain­ing just that in­dex.

With that out of the way, we’ll prove a lemma:

… for any far apart from , both far apart from and (though and need not be far apart from each other). This lets us swap high-level with low-level given vari­ables as we wish, so long as they’re all far apart from each other and from the query vari­ables. Proof:

(by con­struc­tion)

(by ab­strac­tion)

(by con­struc­tion)

By tak­ing and then marginal­iz­ing out un­used vari­ables, this becomes

That’s the first half of our lemma. Other half:

(by Bayes)

(by first half)

(by Bayes)

That takes care of the lemma.

Armed with the lemma, we can finish the main proof by iter­at­ing through the vari­ables in­duc­tively:

(by Bayes)

(by con­struc­tion)

(by lemma)

(by Bayes)

(by lemma & can­cel­la­tion)

(by Bayes)

Here , , and are all far apart. Start­ing with empty and ap­ply­ing this for­mula to each vari­able , one-by-one, com­pletes the proof.

• Planned sum­mary for the Align­ment Newslet­ter:

If we are to un­der­stand em­bed­ded agency, we will likely need to un­der­stand ab­strac­tion (see <@here@>(@Embed­ded Agency via Ab­strac­tion@)). This post pre­sents a view of ab­strac­tion in which we ab­stract a low-level ter­ri­tory into a high-level map that can still make re­li­able pre­dic­tions about the ter­ri­tory, for some set of queries (whether prob­a­bil­is­tic or causal).

For ex­am­ple, in an ideal gas, the low-level con­figu­ra­tion would spec­ify the po­si­tion and ve­loc­ity of _ev­ery sin­gle gas par­ti­cle_. Nonethe­less, we can cre­ate a high-level model where we keep track of things like the num­ber of molecules, av­er­age ki­netic en­ergy of the molecules, etc which can then be used to pre­dict things like pres­sure ex­erted on a pis­ton.

Given a low-level ter­ri­tory L and a set of queries Q that we’d like to be able to an­swer, the min­i­mal-in­for­ma­tion high-level model stores P(Q | L) for ev­ery pos­si­ble Q and L. How­ever, in prac­tice we don’t start with a set of queries and then come up with ab­strac­tions, we in­stead de­velop crisp, con­cise ab­strac­tions that can an­swer many queries. One way we could de­velop such ab­strac­tions is by only keep­ing in­for­ma­tion that is visi­ble “far away”, and throw­ing away in­for­ma­tion that would be wiped out by noise. For ex­am­ple, when typ­ing 3+4 into a calcu­la­tor, the ex­act voltages in the cir­cuit don’t af­fect any­thing more than a few microns away, ex­cept for the fi­nal re­sult 7, which af­fects the broader world (e.g. via me see­ing the an­swer).

If we in­stead take a sys­tems view of this, where we want ab­strac­tions of mul­ti­ple differ­ent low-level things, then we can equiv­a­lently say that two far-away low-level things should be in­de­pen­dent of each other _when given their high-level sum­maries_, which are sup­posed to be able to quan­tify all of their in­ter­ac­tions.

Planned opinion:

I re­ally like the con­cept of ab­strac­tion, and think it is an im­por­tant part of in­tel­li­gence, and so I’m glad to get bet­ter tools for un­der­stand­ing it. I es­pe­cially like the for­mu­la­tion that low-level com­po­nents should be in­de­pen­dent given high-level sum­maries—this cor­re­sponds neatly to the prin­ci­ple of en­cap­su­la­tion in soft­ware de­sign, and does seem to be a fairly nat­u­ral and el­e­gant de­scrip­tion, though of course ab­strac­tions in prac­tice will only ap­prox­i­mately satisfy this prop­erty.
• LGTM

• I won­der whether it’d be use­ful to dis­t­in­guish be­tween the fol­low­ing things.

Con­sider the ex­am­ple of the street map. is the ex­act same thing as ex­cept that there is de­tail re­moved (in a loose sense at least; in prac­tice the map will prob­a­bly have small differ­ences com­pared to the ter­ri­tory).

Now con­sider the ex­am­ple of an ideal gas. throws away all of the stuff in and re­places it with sum­mary statis­tics.

Both fit the defi­ni­tion of ab­strac­tion be­cause you’re re­mov­ing in­for­ma­tion but main­tain­ing the abil­ity to an­swer ques­tions, but in the ideal gas ex­am­ple you’re adding some­thing new in. Namely the sum­mary statis­tics.

(Well, maybe “new” isn’t the right word. Maybe it’d be bet­ter to say “adding sum­mary statis­tics”. I guess the thing I’m re­ally try­ing to point at is the fact that some­thing is be­ing added.)

• For the street map ex­am­ple, is the phys­i­cal city streets—which means it’s the molecules/​atoms/​fields which com­prise the streets. When we rep­re­sent the streets as lines on pa­per, those lines are sum­mary statis­tics of molecule po­si­tions, just like the ideal gas ex­am­ple. The only differ­ence is that in the ideal gas ex­am­ple, it’s a lot eas­ier to ex­press the rele­vant dis­tri­bu­tion which the statis­tics sum­ma­rize.

That said, I do think you’re point­ing to some­thing in­ter­est­ing. There is a sense in which a high-level model adds some­thing in.

Look at the fac­tor­iza­tions in the “Sys­tems View” sec­tion. They are fac­tor­iza­tions of a joint dis­tri­bu­tion over both the high-level and low-level vari­ables. We have a sin­gle model which in­cludes both sets of vari­ables. The high-level vari­ables are quite liter­ally added into the low-level model as new vari­ables com­puted from the old. The high-level model then keeps those new vari­ables, and throws away all the origi­nal low-level vari­ables.

• Ah, I think I see what you mean. That makes sense that the high level model of the street map is also a sum­mary statis­tic, not just the low level model with stuff thrown away. Let my try to re­fine my com­ment.

For the ideal gas ex­am­ple, I think of the low level model as look­ing some­thing like this:

class LowLevelGas {
Particle[] particles;
}

class Particle {
String compound;
int speed;
int direction;
int mass;
// whatever else
}


And I think of the high level model as look­ing like this:

class HighLevelGas {
int pressure;
int volume;
int temperature;
}


LowLevelGas and HighLevelGas just look like there’s a big differ­ence be­tween the two. On the other hand, LowLevelStreetMap and HighLevelStreetMap wouldn’t look as differ­ent. It’d be analo­gous to a sketch vs a pho­to­graph, where the differ­ence is sort of a mat­ter of re­s­olu­tion. But with LowLevelGas and HighLevelGas, it seems like they are differ­ent in a more fun­da­men­tal way. They have differ­ent prop­er­ties, not the same prop­er­ties at differ­ent re­s­olu­tions.

I won­der if this “re­s­olu­tion” idea can be made more for­mal. Some­thing along the lines of look­ing at the high level vari­ables and low level vari­ables and see­ing how… similar?… they are.

Elizer’s idea of Thingspace comes to mind. In the­ory maybe you could look at how close they are in Thingspace, but in prac­tice that seems re­ally difficult.

• I’ll ex­plain what I think you’re point­ing to here, then let me know if it sounds like what you’re imag­in­ing.

In the street map ex­am­ple, ev­ery­thing is spa­tially delineated, at both the high and low level. The ab­strac­tion can be in­ter­preted as blur­ring the spa­tial re­s­olu­tion. The spa­tial re­s­olu­tion is a con­served struc­ture be­tween the two. But in the gas ex­am­ple, ev­ery­thing is blurred “all the way”, so there’s just a sin­gle high-level ob­ject which doesn’t re­tain any struc­ture which is ob­vi­ously-similar to the low-level.

(An in­ter­me­di­ate be­tween these two would be Navier Stokes: it ap­plies ba­si­cally the same ab­strac­tion as the ideal gas, but only within small spa­tially-delineated cells.)

So there’s po­ten­tially a differ­ence be­tween ab­strac­tions which throw away ba­si­cally all the struc­ture vs ab­strac­tions which re­tain some.

This points to­ward a more gen­eral class of ques­tions: when, and to what ex­tent, does it all add up to nor­mal­ity? We learned the high-level ideal gas laws long be­fore we learned the low-level molec­u­lar the­ory, but we knew the low-level had to at least be con­sis­tent with that high-level struc­ture. What low-level struc­tures did that con­straint ex­clude? More gen­er­ally: to what ex­tent does our knowl­edge of the high-level model struc­ture con­strain the pos­si­ble low-level struc­tures?

One good class of struc­ture for these sorts of ques­tions is causal struc­ture: to what ex­tent does high-level causal struc­ture con­strain the pos­si­ble low-level causal struc­tures? I’ll prob­a­bly have a post on that soon-ish.

• So there’s po­ten­tially a differ­ence be­tween ab­strac­tions which throw away ba­si­cally all the struc­ture vs ab­strac­tions which re­tain some.

Yeah, that’s what I’m get­ting at.

• This points to­ward a more gen­eral class of ques­tions: when, and to what ex­tent, does it all add up to nor­mal­ity? We learned the high-level ideal gas laws long be­fore we learned the low-level molec­u­lar the­ory, but we knew the low-level had to at least be con­sis­tent with that high-level struc­ture. What low-level struc­tures did that con­straint ex­clude? More gen­er­ally: to what ex­tent does our knowl­edge of the high-level model struc­ture con­strain the pos­si­ble low-level struc­tures?

One good class of struc­ture for these sorts of ques­tions is causal struc­ture: to what ex­tent does high-level causal struc­ture con­strain the pos­si­ble low-level causal struc­tures? I’ll prob­a­bly have a post on that soon-ish.

Doesn’t high-level struc­ture en­tail statis­ti­cal av­er­ages and not nec­es­sar­ily Boltz­mann brains in the low-level struc­ture? Like—what of the nonequil­ibrium statis­ti­cal me­chan­ics?

• Prob­lem is, we didn’t know be­fore­hand (i.e. in 1800) that the high-level things we saw (like tem­per­a­tures, heat flow, etc) had any­thing to do with statis­ti­cal av­er­ages. One could imag­ine an al­ter­na­tive uni­verse run­ning on differ­ent physics, where heat re­ally is a fluid and yet macro­scop­i­cally it be­haves a lot like heat in our uni­verse. If we imag­ine all the differ­ence ways things could have turned out to work, given only what we knew in 1800, where does that leave us? What low-level struc­ture is im­plied by the high-level struc­ture?

• (I have a sense that the an­swer to this ques­tion is in the post but I’m hav­ing trou­ble ex­tract­ing it out.)

There’s some­thing that I think of as com­po­si­tion, and I’m not sure if this fits the defi­ni­tion of ab­strac­tion. Con­sider in the con­text of pro­gram­ming a User that has email and password prop­er­ties. We think of User as an ab­strac­tion. It’s a thing that is com­posed of an email and pass­word. I’m not see­ing how this fits the defi­ni­tion of ab­strac­tion though. In par­tic­u­lar, what in­for­ma­tion is be­ing thrown away? What is the low level model, and what is the high level model?

The User ex­am­ple demon­strates com­po­si­tion of prop­er­ties, but you could also have com­po­si­tion of in­struc­tions. For ex­am­ple, setPassword might con­sist of 1) saltPassword, 2) hashPassword and then 3) savePassword. Here we’d say that setPassword is an ab­strac­tion, but what is the in­for­ma­tion that is be­ing thrown away?

• This is a great ques­tion.

Months ago, I had mul­ti­ple drafts of posts for­mu­lat­ing ab­strac­tion in terms of trans­for­ma­tions on causal mod­els. The two main trans­for­ma­tions were:

• Glom to­gether some variables

• Throw out in­for­ma­tion from a variable

What you’re call­ing com­po­si­tion is, I be­lieve, the first one.

Even­tu­ally, I came to the view that it’s in­for­ma­tion throw-away speci­fi­cally which re­ally char­ac­ter­izes ab­strac­tion—it’s cer­tainly the part where most of the in­ter­est­ing prop­er­ties come from. But in the ma­jor­ity of use-cases, at least some de­gree of com­po­si­tion takes place as a sort of pre-pro­cess­ing step.

• User looks like it’s just a com­po­si­tion, al­though we could talk about it as a de­gen­er­ate case of ab­strac­tion where all of the in­for­ma­tion is po­ten­tially rele­vant to things out­side the User ob­ject it­self, so we throw away noth­ing. That said, a lot of what we do with a User ob­ject ac­tu­ally does in­volve ig­nor­ing the in­for­ma­tion in its fields—e.g. we don’t think about emails and pass­words when declar­ing a List<User> type or work­ing with that list. So maybe there’s a case to be made that it’s an ab­strac­tion in the in­for­ma­tion-throw-away sense.

• setPass­word does throw away in­for­ma­tion: it throws away the in­di­vi­d­ual steps. To an out­side caller, it’s just a black-box func­tion which sets the pass­word; they don’t know any­thing about the in­ter­nal steps.

So to the ex­tent that these are both “throw­ing away in­for­ma­tion”, it’s in the sense that large chunks of our code treat them as black-boxes and don’t look at their in­ter­nal de­tails. When things do look at their in­ter­nal de­tails, those things are “close to” the ob­jects, so the ab­strac­tion breaks down/​leaks—e.g. if some­thing tried to re­con­struct the in­ter­nal steps of setPass­word, that would definitely be an ex­am­ple of leaky ab­strac­tion.

• Hm, it seems to me that there is a dis­tinc­tion be­tween 1) hid­ing in­for­ma­tion (or en­cap­su­lat­ing it, or mak­ing it pri­vate), 2) ig­nor­ing it, and 3) get­ting rid of it all to­gether.

1. For setPassword per­haps a pro­gram­mer who uses this method can’t see the in­ter­nals of what is ac­tu­ally hap­pen­ing (the salt­ing, hash­ing and stor­ing). They just call user.setPassword(form.password) and it does what they need it to do.

2. For User, in the ex­am­ple you give with List<User>, maybe we want to count how many users there are, and in do­ing so we don’t care about what prop­er­ties users have. It could be email and password, or it could be username and dob, in the con­text of count­ing how many users there are you don’t care. How­ever, the in­ner de­tails aren’t ac­tu­ally hid­den, you’re just choos­ing to ig­nore it.

3. For ideal gasses, we’re get­ting rid of the in­for­ma­tion about par­ti­cles. It’s not that it’s hid­den/​pri­vate/​en­cap­su­lated, it’s just not even there af­ter we re­place it with the sum­mary statis­tics.

What do you think? Am I mi­s­un­der­stand­ing some­thing?

And in the case that I am cor­rect about the dis­tinc­tion, I won­der if it’s some­thing worth point­ing out.

• Sounds like the dis­tinc­tion is about where/​how we’re draw­ing the ab­strac­tion bound­aries.

• “Hid­ing in­for­ma­tion” sug­gests that there’s some ob­ject X with a bound­ary (i.e. a Markov blan­ket), and only the sum­mary in­for­ma­tion is visi­ble out­side that bound­ary.

• “Ig­nor­ing in­for­ma­tion” sug­gests that there’s some other ob­ject(s) Y with a bound­ary around them, and only the sum­mary in­for­ma­tion about X is visi­ble in­side that bound­ary.

So ba­si­cally we’re defin­ing which vari­ables are “far away” by ex­clu­sion in one case (i.e. “ev­ery­thing ex­cept blah is far away”) and in­clu­sion in the other case (i.e. “only blah is far away”). I could definitely imag­ine the two hav­ing differ­ent al­gorith­mic im­pli­ca­tions and differ­ent ap­pli­ca­tions.

As for “get­ting rid of in­for­ma­tion”, I think that’s hid­ing in­for­ma­tion plus some­how elimi­nat­ing our own abil­ity to ob­serve the hid­den part. Again, I could definitely imag­ine that hav­ing ad­di­tional al­gorith­mic im­pli­ca­tions or ap­pli­ca­tions. (Though this one feels weird for me to think about at all; I usu­ally imag­ine ev­ery­thing from an ex­ter­nal per­spec­tive where ev­ery­thing is always ob­serv­able and im­mutable.)

• Yeah I think your de­scrip­tions match what I was get­ting at.

• (Typo: First bul­let point un­der “Ideal Gas” should use in­stead of .)

• Fixed, thanks.

• Thanks a lot for this com­pressed sum­mary! As some­one who tries to un­der­stand your work, but is some­times lost within your se­quence, this helps a lot.

I can­not com­ment the maths in any in­ter­est­ing way, but I feel that your re­stricted no­tion of ab­strac­tion—where the high-level sum­mary cap­ture what’s rele­vant to “far way” vari­ables”—works very well with my in­tu­ition. I like that it works with “far away” in time too, for ex­am­ple in ab­stract­ing cur­rent events as mem­o­ries for fu­ture use.