Hiding Complexity

1. The Principle

Sup­pose you have some difficult cog­ni­tive prob­lem you want to solve. What is the differ­ence be­tween (1) mak­ing progress on the prob­lem by think­ing about it for an hour and (2) solv­ing a well-defined sub­prob­lem whose solu­tion is use­ful for the en­tire prob­lem?

(Find­ing a good char­ac­ter­i­za­tion of the ‘sub­prob­lem’ cat­e­gory is im­por­tant for Fac­tored Cog­ni­tion, but for [this post minus the last chap­ter], you can think of it purely as a prob­lem of epistemic ra­tio­nal­ity and hu­man think­ing.)

I ex­pect most to share the in­tu­ition that there is a differ­ence. How­ever, the ques­tion ap­pears ill-defined on sec­ond glance. ‘Mak­ing progress’ has to cash out as learn­ing things you didn’t know be­fore, and it’s un­clear how that isn’t ‘solv­ing sub­prob­lems’. What­ever you learned could prob­a­bly be con­sid­ered the solu­tion to some prob­lem.

If we ac­cept this, then both (1) and (2) tech­ni­cally in­volve solv­ing sub­prob­lems. Nonethe­less, we would in­tu­itively talk about sub­prob­lems in (2) and not in (1). Can we char­ac­ter­ize this differ­ence for­mally? Is there a well-defined, low-level quan­tity such that our in­tu­ition as to whether we would call a bun­dle of cog­ni­tive work a ‘sub­prob­lem’ cor­re­sponds to the size of this quan­tity? I think there is. If you want, take a minute to think about it your­self; I’ve put my pro­posed solu­tion into spoilers.

I think the quan­tity is the length of the sub­prob­lem’s solu­tion, where by “solu­tion”, I mean “the in­for­ma­tion about the sub­prob­lem rele­vant for solv­ing the en­tire prob­lem”.

As an ex­am­ple, sup­pose the en­tire prob­lem is “figure out the best next move in a chess game”. Let’s con­trast (1) and (2):

  • (1) was some­one think­ing about this for an hour. The ‘solu­tion’ here con­sists of ev­ery­thing she learns through­out that time, which may in­clude many differ­ent ideas/​in­sights about differ­ent pos­si­ble moves/​re­solved con­fu­sions about the game state. There is prob­a­bly no way to sum­ma­rize all that in­for­ma­tion briefly.

  • (2) was solv­ing a well-defined sub­prob­lem. An ex­am­ple here is, “figure out how good Be5 is”.[1] If the other side can check in four turns given that move, then the en­tire solu­tion to this sub­prob­lem is the three-word state­ment “Be5 is ter­rible”.

2. The Soft­ware Analogy

Be­fore we get to why I think the prin­ci­ple mat­ters, let’s try to un­der­stand it bet­ter. I think the anal­ogy to soft­ware de­sign is helpful here.

Sup­pose a com­pany wants to de­sign some big pro­ject that will take about 900k (i.e., 900000) lines of code. How difficult is this? Here is a naive calcu­la­tion:

An am­a­teur pro­gram­mer with Python can write a 50 line pro­ce­dure with­out bugs in an hour, which sug­gests a to­tal time re­quire­ment of 18k hours. Thus, a hun­dred am­a­teur pro­gram­mers work­ing 30 hours a week can write the pro­ject in six weeks.

I’m not sure how far this calcu­la­tion is off, but I think it’s at least a fac­tor of 20. This sug­gests that lin­ear ex­trap­o­la­tion doesn’t work, and the rea­son for this is sim­ple: as the size of the pro­ject goes up, not only is there more code to im­ple­ment, but ev­ery piece of code be­comes harder be­cause the en­tire pro­ject is more com­plex. There are mode de­pen­den­cies, more sources of er­ror, and so forth.

This is where de­com­po­si­tions come in. Sup­pose the en­tire pro­ject can be vi­su­al­ized like this, where black boxes de­note com­po­nents (cor­re­spond­ing to pieces of code) and edges de­pen­den­cies be­tween com­po­nents.

This nat­u­rally fac­tors into three parts. Imag­ine you’re head of the team tasked with im­ple­ment­ing the bot­tom-left part. You can look at your job like this:

(An ‘in­ter­face’ is purely a speci­fi­ca­tion of the re­la­tion­ship, so the el­lipses are each less than one black box.)

Your team still has to im­ple­ment 300k lines of code, but re­gard­less of how difficult this is, it’s only marginally harder than im­ple­ment­ing a pro­ject that con­sists en­tirely of 300k lines. In the step from 300k to 900k, the cost ac­tu­ally does scale al­most lin­early.[2]

As said at the out­set, I’m talk­ing about this not to make a point about soft­ware de­sign but as an anal­ogy to the topic of bet­ter and worse de­com­po­si­tions. In the anal­ogy, the en­tire prob­lem is cod­ing the 900k line sys­tem, the sub­prob­lems are cod­ing the three parts, and the solu­tions to the sec­ond and third part are the in­ter­faces.

I think this illus­trates both why the mechanism is im­por­tant and how ex­actly it works.

For the ‘why’, imag­ine the de­com­po­si­tion were a lot worse. In this case, there’s a higher over­head for each team, ergo higher over­all cost. This has a di­rect ana­log in the case where a per­son is think­ing about a prob­lem on her own: the more com­plex the solu­tions to sub­prob­lems are, the harder it be­comes for her to ap­ply them to the en­tire prob­lem. We are heav­ily bot­tle­necked by our abil­ity to think about sev­eral things at once, so this can make a mas­sive differ­ence.

For the ‘how’, no­tice that, while the com­plex­ity of the en­tire sys­tem triv­ially grows with its size, the task of pro­gram­ming it can ideally be kept sim­ple (as in the case above), and this is done by hid­ing com­plex­ity. From the per­spec­tive of your team (pre­vi­ous pic­ture), al­most the en­tire com­plex­ity of the re­main­ing pro­ject is hid­den: it’s been re­duced to two sim­ple, well-defined interfaces

This mechanism is the same in the case where some­one is work­ing on a prob­lem by her­self: if she can carve out sub­prob­lems, and if those sub­prob­lems have short solu­tions, it dra­mat­i­cally re­duces the per­ceived com­plex­ity of the en­tire prob­lem. In both cases, we can think of the qual­ity of a de­com­po­si­tion as the to­tal amount of com­plex­ity it hides.[3]

3. Hu­man Learning

I’ve come to view hu­man learn­ing pri­mar­ily un­der the lens of hid­ing com­plex­ity. The world is ex­tremely com­pli­cated; the only way to nav­i­gate it is to view it on many differ­ent lay­ers of ab­strac­tion, such that each layer de­scribes re­al­ity in a way that hides 99%+ of what’s re­ally go­ing on. Some­thing as com­plex as go­ing gro­cery shop­ping is com­monly re­duced to an in­ter­face that only mod­els time re­quire­ment and re­sults.

Ab­stractly, here is the prin­ci­pled ar­gu­ment as to why we know this is hap­pen­ing:

  1. Think­ing about a lot of things at once feels hard.

  2. Any topic you un­der­stand well feels easy.

  3. There­fore, any topic you un­der­stand well doesn’t de­pend on a lot of things in your in­ter­nal rep­re­sen­ta­tion (i.e., in what­ever struc­ture your brain uses to store in­for­ma­tion).

  4. How­ever, many top­ics do, in fact, de­pend on a lot of things.

  5. This im­plies your in­ter­nal rep­re­sen­ta­tion is hid­ing com­plex­ity.

For a more elab­o­rate con­crete ex­am­ple, con­sider the task “cre­ate a pre­sen­ta­tion about ”, where is some­thing rel­a­tively sim­ple:

  • At the high­est level, you might think solely about the amount of time you have left to do it; the com­plex­ity of how to do it is hid­den.

  • One level lower, you might think about (1) cre­at­ing the slides and (2) prac­tic­ing the speak­ing part; the com­plex­ity of how to do ei­ther is hid­den.

  • One level lower, you might think about (1) what points you want to make through­out your pre­sen­ta­tion and (2) in what or­der do you want to make those points; the com­plex­ity of how to turn a point into a set of slides is hid­den.

  • One level lower, you might think about how what slides you want for each ma­jor point; the com­plex­ity of how to cre­ate each in­di­vi­d­ual slide is hid­den.

  • Et cetera.

In ab­solute terms, prepar­ing a pre­sen­ta­tion is hard. It re­quires many differ­ent ac­tions that must be car­ried out with a lot of pre­ci­sion for them to work. Nonethe­less, the pro­cess of prepar­ing it prob­a­bly feels easy all the way be­cause ev­ery level hides a ton of com­plex­ity. This works be­cause you un­der­stand the pro­cess well: you know what lev­els of ab­strac­tion to use, and how and when to tran­si­tion be­tween them.

The ex­treme ver­sion of this view (which I’m not ar­gu­ing for) is that learn­ing is al­most en­tirely about hid­ing com­plex­ity. When you first hear of some new con­cept, it sounds all com­pli­cated and like it has lots of mov­ing parts. When you suc­cess­fully learned it, the com­plex­ity is hid­den, and when the com­plex­ity is hid­den, you have learned it. Given that hu­mans can only think about a few things at the same time, this pro­cess only bot­toms out on ex­ceed­ingly sim­ple tasks. Thus, un­der the ex­treme view, it’s not tur­tles all the way down, but pretty far down. For the most part, learn­ing just is rep­re­sent­ing con­cepts such that com­plex­ity is hid­den.

I once wrote a tiny post ti­tled ‘We tend to for­get com­pli­cated things’. The ob­ser­va­tion was that, if you stop study­ing a sub­ject when it feels like you barely un­der­stand it, you will al­most cer­tainly for­get about it in time (and my con­clu­sion was that you should always study un­til you think it’s easy). This agrees with the hid­ing com­plex­ity view: if some­thing feels com­pli­cated, it’s a sign that you haven’t yet de­com­posed it such that com­plex­ity is hid­den at ev­ery level, and hence haven’t learned it prop­erly. Un­der this view, ‘learn­ing com­pli­cated things’ is al­most an oxy­moron: proper learn­ing must in­volve mak­ing things feel not-com­pli­cated.

It’s worth not­ing that this prin­ci­ple ap­pears to ap­ply even for mem­o­riz­ing ran­dom data, at least to some ex­tent, even though you might ex­pect pure mem­o­riza­tion to be a counter-ex­am­ple.

There is also this lovely pie chart, which makes the same ob­ser­va­tion for math­e­mat­ics:

That is, math is not in­her­ently com­pli­cated; only the parts that you haven’t yet rep­re­sented in a nice, com­plex­ity-hid­ing man­ner feel com­pli­cated. Once you have mas­tered a field, it feels won­der­fully sim­ple.

4. Fac­tored Cognition

As men­tioned in the out­set, char­ac­ter­iz­ing sub­prob­lems is im­por­tant for Fac­tored Cog­ni­tion. Very briefly, Fac­tored Cog­ni­tion is about de­com­pos­ing a prob­lem into smaller prob­lems. In one set­ting, a hu­man has ac­cess to a model that is similar to her­self, ex­cept (1) slightly dumber and (2) much faster (i.e., it can an­swer ques­tions al­most in­stantly).

The hope is that this com­bined sys­tem (of the hu­man who is al­lowed to use the model as of­ten as she likes) is more ca­pa­ble than ei­ther the hu­man or the model by them­selves, and the idea is that the hu­man can am­plify perfor­mance by de­com­pos­ing big prob­lems into smaller prob­lems, let­ting the model solve the small prob­lems, and us­ing its an­swers to solve the big prob­lem.

There are a ton of de­tails to this, but most of them don’t mat­ter for our pur­poses.[4] What does mat­ter is that the model has no mem­ory and can only give short an­swers. This means that the hu­man can’t just tell it ‘make progress on the prob­lem’, ‘make more progress on the prob­lem’ and so on, but in­stead has to choose sub­prob­lems whose solu­tions can be de­scribed in a short mes­sage.

An un­ex­pected take­away from think­ing about this is that I now view Fac­tored Cog­ni­tion as in­ti­mately re­lated with learn­ing in gen­eral, the rea­son be­ing that both share the goal of choos­ing sub­prob­lems whose solu­tions are as short as pos­si­ble:

  • In the set­ting I’ve de­scribed for Fac­tored Cog­ni­tion, this is im­me­di­ate from the fact that the model can’t give long an­swers.

  • For learn­ing, this is what I’ve ar­gued in this post. (Note that op­ti­miz­ing sub­prob­lems to min­i­mize the length of their solu­tions is syn­ony­mous with op­ti­miz­ing them to max­i­mize their hid­den com­plex­ity.)

In other words, Fac­tored Cog­ni­tion pri­mar­ily asks you to do some­thing that you want to do any­way when learn­ing about a sub­ject. I’ve found that bet­ter un­der­stand­ing the re­la­tion­ship be­tween the two has changed my think­ing about both of them.

(This post has been the sec­ond of two pro­logue posts for an up­com­ing se­quence on Fac­tored Cog­ni­tion. I’ve posted them as stand-alone be­cause they make points that go be­yond that topic. This won’t be true for the re­main­ing se­quence, which will be nar­rowly fo­cused on Fac­tored Cog­ni­tion and its rele­vance for Iter­ated Am­plifi­ca­tion and De­bate.)

  1. Be5 is “move the bishop to square E5”. ↩︎

  2. One rea­son why this doesn’t re­flect re­al­ity is that real de­com­po­si­tions will sel­dom be as good; an­other is that com­ing up with the de­com­po­si­tion is part of the work (and in ex­ten­sion, part of the cost). Note that, even in this case, the three parts all need to be de­com­posed fur­ther, which may not work as well as the first de­com­po­si­tion did. ↩︎

  3. In Soft­ware de­sign, the term ‘mod­u­lar­ity’ de­scribes some­thing similar, but it is not a perfect match. Wikipe­dia defines it as “a log­i­cal par­ti­tion­ing of the ‘soft­ware de­sign’ that al­lows com­plex soft­ware to be man­age­able for the pur­pose of im­ple­men­ta­tion and main­te­nance”. ↩︎

  4. After all, this is a post about hid­ing com­plex­ity! ↩︎