# An Educational Singularity

Knowl­edge com­pounds. Paint­ing is good prac­tice for ar­chi­tec­ture. Stand-up com­edy is good train­ing for stage magic.

The knowl­edge trans­fer­ence be­tween math and physics might be as high as 75%. Un­for­tu­nately, the knowl­edge trans­fer­ence be­tween two ran­dom fields tends to be small. Between math and draw­ing it might be as low as 1%. In my per­sonal ex­pe­rience it’s hard to find any pair of broadly ap­pli­ca­ble sub­jects that don’t have at least 1% over­lap. Usu­ally the num­ber will be be­tween 1% and 75%.

Let’s sup­pose mas­ter­ing a new field of knowl­edge gives you a 5% dis­count on av­er­age on ev­ery sub­se­quent field. Sup­pose it takes time for some­one who knows noth­ing to mas­ter a new field. Then the amount of time it takes to mas­ter a new field is a func­tion of how many fields you have already mas­tered .

How much time does it take to learn fields in­stead of just the field ?

This is a ge­o­met­ric se­ries.

ap­pears to con­verge.

con­verges for ev­ery pos­i­tive trans­fer­ence rate. If we use 1% in­stead of 5% we just get .

What does this mean?

# The ed­u­ca­tional phase transition

Ob­vi­ously, some­one who has hit is not go­ing to pos­sess all of hu­man knowl­edge. No mat­ter how much you know it’s still go­ing to take you some min­i­mum time to learn the di­alec­ti­cal quirks of, say, He­jazi Ara­bic.

What re­ally means is you’ve hit a cer­tain endgame. The pro­cess of learn­ing has un­der­gone a phase tran­si­tion. All the broad con­cep­tual ma­chin­ery and widely-ap­pli­ca­ble facts are there. Pick­ing up any­thing new is just a mat­ter of plug­ging new data into pre­ex­ist­ing sock­ets.

More in­ter­est­ing than “what hap­pens at this phase tran­si­tion” is the idea that “there is a phase tran­si­tion” and we can reach it in finite time. Per­haps even within a hu­man life­time.

Much like stream en­try, I sus­pect any­one who achieves this phase tran­si­tion is bet­ter off keep­ing his/​her mouth shut about it in po­lite so­ciety.

# An al­ter­na­tive model

The con­cept of a sin­gu­lar­ity in the above model re­lies on a dou­ble-pos­i­tive feed­back loop. It as­sumes “Each sub­ject you know con­veys a com­pound­ing 5% dis­count on learn­ing each sub­se­quent sub­ject.” If we tweak this as­sump­tion into “Each of study time con­veys a com­pound­ing 5% dis­count on learn­ing each sub­se­quent sub­ject” then never con­verges as .

How­ever, this al­ter­na­tive model still breaks down at high . It just breaks down grad­u­ally. For ex­am­ple, at the ex­po­nen­tial model pre­dicts an ab­surd learn­ing rate times that of a be­gin­ner. In the hu­man world such a high rate of learn­ing is in­dis­t­in­guish­able from in­finity. The model has bro­ken down.

• Is “knowl­edge trans­fer­ence” a real thing, or one of those thou­sand things that didn’t repli­cate? There are many myths in ed­u­ca­tion, I won­der if this is one of them.

(I tried Wikipe­dia, but it only has an ar­ti­cle on “knowl­edge trans­fer”, which is about shar­ing in­for­ma­tion be­tween peo­ple within an or­ga­ni­za­tion, i.e. some­thing com­pletely differ­ent.)

Bryan Ca­plan in The Case Against Ed­u­ca­tion writes:

[Teach­ers say:] A his­tory class can teach crit­i­cal think­ing; a sci­ence class can teach logic. Think­ing—all think­ing—builds men­tal mus­cles. The big­ger stu­dents’ men­tal mus­cles, the bet­ter they’ll be at what­ever job they even­tu­ally land.
[Is it true?] For the most part, no. Ed­u­ca­tional psy­chol­o­gists who spe­cial­ize in “trans­fer of learn­ing” have mea­sured the hid­den in­tel­lec­tual benefits of ed­u­ca­tion for over a cen­tury. Their chief dis­cov­ery: ed­u­ca­tion is nar­row. As a rule, stu­dents learn only the ma­te­rial you speci­fi­cally teach them . . . if you’re lucky. In the words of ed­u­ca­tional psy­chol­o­gists Perk­ins and Salomon, “Be­sides just plain for­get­ting, peo­ple com­monly fail to mar­shal what they know effec­tively in situ­a­tions out­side the class­room or in other classes in differ­ent dis­ci­plines. The bridge from school to be­yond or from this sub­ject to that other is a bridge too far.”
Many ex­per­i­ments study trans­fer of learn­ing un­der seem­ingly ideal con­di­tions. Re­searchers teach sub­jects how to an­swer Ques­tion A. Then they im­me­di­ately ask their sub­jects Ques­tion B, which can be hand­ily solved us­ing the same ap­proach as Ques­tion A. Un­less A and B look al­ike on the sur­face, or sub­jects get a heavy-handed hint to ap­ply the same ap­proach, learn­ing how to solve Ques­tion A rarely helps sub­jects an­swer Ques­tion B.
[In an ex­per­i­ment when sub­jects are told a mil­i­tary puz­zle and its solu­tion, and then a med­i­cal puz­zle which can be solved analog­i­cally,] A typ­i­cal suc­cess rate is 30%. Since about 10% of sub­jects who don’t hear the mil­i­tary prob­lem offer the con­ver­gence solu­tion, only one in five sub­jects trans­ferred what they learned. To reach a high (roughly 75%) suc­cess rate, you need to teach sub­jects the first story, then bluntly tell them to use the first story to solve the sec­ond.
To re­peat, such ex­per­i­ments mea­sure how hu­mans “learn how to think” un­der ideal con­di­tions: teach A, im­me­di­ately ask B, then see if sub­jects use A to solve B. Re­searchers are lead­ing the wit­ness. As psy­chol­o­gist Dou­glas Det­ter­man re­marks: “Teach­ing the prin­ci­ple in close as­so­ci­a­tion with test­ing trans­fer is not very differ­ent from tel­ling sub­jects that they should use the prin­ci­ple just taught. Tel­ling sub­jects to use a prin­ci­ple is not trans­fer. It is fol­low­ing in­struc­tions.”
Un­der less promis­ing con­di­tions, trans­fer is pre­dictably even worse. Mak­ing the sur­face fea­tures of A and B less similar im­pedes trans­fer. Ad­ding a time de­lay be­tween teach­ing A and test­ing B im­pedes trans­fer. Teach­ing A, then teach­ing an ir­rele­vant dis­tracter prob­lem, then test­ing B, im­pedes trans­fer. Teach­ing A in a class­room, then test­ing B in the real world im­pedes trans­fer. Hav­ing one per­son teach A and an­other per­son test B im­pedes trans­fer.
[...] No won­der even trans­fer op­ti­mists like Robert Haskell lament: “De­spite the im­por­tance of trans­fer of learn­ing, re­search find­ings over the past nine decades clearly show that as in­di­vi­d­u­als, and as ed­u­ca­tional in­sti­tu­tions, we have failed to achieve trans­fer of learn­ing on any sig­nifi­cant level.”
[...] Coun­terex­am­ples do ex­ist, but com­pared to teach­ers’ high hopes, effects are mod­est, nar­row, and of­ten only in one di­rec­tion. One ex­per­i­ment ran­domly taught one of two struc­turally equiv­a­lent top­ics: (a) the alge­bra of ar­ith­metic pro­gres­sion, or (b) the physics of con­stant ac­cel­er­a­tion. Re­searchers then asked alge­bra stu­dents to solve the physics prob­lems, and physics stu­dents to solve the alge­bra prob­lems. Only 10% of the physics stu­dents used what they learned to solve the alge­bra prob­lems. But a re­mark­able 72% of the alge­bra stu­dents used what they learned to solve the physics prob­lems. Ap­ply­ing ab­stract math to con­crete physics comes much more nat­u­rally than gen­er­al­iz­ing from con­crete physics to ab­stract math.
[...] Each ma­jor sharply im­proved on pre­cisely one sub­test. So­cial sci­ence and psy­chol­ogy ma­jors be­came much bet­ter at statis­ti­cal rea­son­ing—the abil­ity to ap­ply “the law of large num­bers and the re­gres­sion or base rate prin­ci­ples” to both “sci­en­tific and ev­ery­day-life con­texts.” Nat­u­ral sci­ence and hu­man­i­ties ma­jors be­came much bet­ter at con­di­tional rea­son­ing—the abil­ity to cor­rectly an­a­lyze “if . . . then” and “if and only if” prob­lems. On re­main­ing sub­tests, how­ever, gains af­ter three and half years of col­lege were mod­est or nonex­is­tent.
[...] Trans­fer re­searchers usu­ally be­gin their ca­reers as ideal­ists. Be­fore study­ing ed­u­ca­tional psy­chol­ogy, they take their power to “teach stu­dents how to think” for granted. When they dis­cover the pro­fes­sional con­sen­sus against trans­fer, they think they can over­turn it. Even­tu­ally, though, young re­searchers grow sad­der and wiser. The sci­en­tific ev­i­dence wears them down—and their first­hand ex­pe­rience as ed­u­ca­tors finishes the job

In­tu­itively, it seems to me that hav­ing a good model of world trained on some sub­jects should provide some ad­van­tage at other sub­jects. But ei­ther it is an ob­vi­ous pre­req­ui­site (such as: un­der­stand­ing chem­istry helps you un­der­stand bio­chem­istry) or the benefits are likely to be small (e.g. from physics I could learn that the uni­verse fol­lows rel­a­tively sim­ple im­per­sonal laws; but that alone does not tell me which laws are fol­lowed in so­ciol­ogy or com­puter sci­ence). Hav­ing good gen­eral knowl­edge can in­oc­u­late one against some fake the­o­ries (e.g. physics and chem­istry against home­opa­thy), but af­ter re­mov­ing the fake frame­works there is still much to learn. Also, the trans­ferred knowl­edge (e.g. “there is no su­per­nat­u­ral, the na­ture fol­low im­per­sonal laws”) is the same for all nat­u­ral sci­ences, so the “X%” you get from physics is the same as the “X%” you get from chem­istry; you do not get “2X%” af­ter learn­ing both of them.

• Ca­plan is cor­rect here. There’s no ‘far trans­fer’ of the sort which might even slightly re­sem­ble ‘get a 5% dis­count on all fu­ture fields you study’. (Not that we see any­one who ex­hibits such an ‘ed­u­ca­tional sin­gu­lar­ity’ in prac­tice, any­way.) At best there might be a sort of meta-study-skill which gives a one-off ‘far trans­fer’ effect, like learn­ing how to use search en­g­ines or spaced rep­e­ti­tion, but it’s quickly ex­hausted and of course just one doesn’t give any sin­gu­lar­ity-es­que effect.

A more plau­si­ble model would be one with pure near-trans­fer: ev­ery field has a few ad­ja­cent fields which give a say 5% near-trans­fer. So one could learn physics/​chem­istry/​biol­ogy, for ex­am­ple, in 2.9x the time of 3 in­di­vi­d­u­als learn­ing the 3 fields sep­a­rately at 3x the time.

• The model is very sim­ple and the con­clu­sion pretty far-reach­ing al­thought in­ter­est­ing. Rather than as­sume that the con­clu­sion is true I would hunt for what mod­el­ling de­tails were glossed over.

Say both paint­ing and stand-up com­edy teach self-ex­pres­sion. If magic util­ises that then it doesn’t dou­ble benefit from that. That is learn­ing a field low­ers how much other fields sup­port learn­ing of new fields.

I could also see how learn­ing a field se­ments a mind­set that makes it harder than com­pletely naive per­son to learn some­thing. Say a lawer benefits from a a mechanis­tic blind in­ter­pre­ta­tion of rules and paint­ing sup­ports a im­pul­sive rein­ter­pre­tion and forfeit­ing rule use. The two ex­perts teach­ings would ac­tively re­sist the other kind of adata­tion. Now it might be it’s own skill to not make them con­flict that much or find the con­text bar­ri­ers were one ap­proach is ap­pli­ca­ble over the other. But this is still work over some­one to whom the area is the only truth. That is while there might be “syn­er­gis­tic” pairs the prob­a­blility that you have “an­ti­syn­er­gis­tic” pairs in­creases as you pick up fields.

Even if the sim­ple anal­y­sis isn’t ironglad ti is likely that the value of be­ing a poly­math is un­der­val­ued and the ex­act cir­cum­stances where it makes sense to adopt a poly­math strat­egy rather than an ex­pert strat­egy is not that widely dis­cussed. Fur­ther com­pli­ca­tion to that is that a group of ex­perts that have differ­ent ar­eas of ex­per­tise is some­what com­pa­rable to a group of ho­moge­nous poly­maths. So even if mov­ing to a more poly­math strat­egy would make a sin­gle per­son more com­pe­tent it’s likely that be­ing more starkly ex­pert would in­crease the groups effec­tive­ness if oth­ers can em­ploy enough trust to get dom­i­nated by the opinions of the ex­perts. This might also have it’s own sin­gu­lar­ity con­di­tions. That is at some point there is enough trust that any area you can train a sin­gle per­son to be a ex­pert on, the group can be made to effec­tively have by adding a per­son to it.