# on wellunderstoodness

epistemic sta­tus: noth­ing new, in­for­mal, may or may not provide a novel com­pres­sion which gives at least one per­son more clar­ity. Also, not an al­ign­ment post, just uses ar­gu­ments about al­ign­ment as a mo­ti­vat­ing ex­am­ple, but the higher-level take­away is ex­tremely similar to the Rocket Align­ment di­alogue and to the defi­ni­tion of “de­con­fu­sion” that MIRI offered re­cently (even though I’m mak­ing a very low-level point)

no­ta­tion: please read a ← x as “a re­placed with x”, e.g. (fx) x<-5 ⇒ (mx+b) x<-5 ⇒ m5+b. (typ­i­cally the se­quents liter­a­ture uses fx[5/​x] but I find this much harder to read).

# motivation

A younger ver­sion of me once said “What’s the big deal with sor­ceror’s ap­pren­tice /​ pa­per­clip max­i­mizer? you would ob­vi­ously just give an and say that you want to be satis­fied up to a thresh­old”. But ul­ti­mately it isn’t hard to be­lieve that soft­ware de­vel­op­ers don’t want to be wor­ry­ing about whether some hap­less fool some­where for­got to give the . Not to men­tion that we don’t ex­pect the need for such thresh­olds to be finitely enu­mer­able. Not to men­tion the need to pick the right ep­silons.

The as­pira­tion of value al­ign­ment just tells you “we want to re­ally un­der­stand goal de­scrip­tion, we want to un­der­stand it so hard that we make ep­silons (chas­ing af­ter the perfectly cal­ibrated thresh­olds) ob­so­lete”. Well, is it clear to ev­ery­one what “re­ally un­der­stand” even means? It’s definitely not ob­vi­ous, and I don’t think ev­ery­one’s on the same page about it.

# core claim

I want to fo­cus on the fol­low­ing idea;

### to be math­e­mat­i­cally un­der­stood is to be re­duced to sub­sti­tu­tion in­stances.

Con­sider; lin­ear alge­bra. If I want to teach some­one the mean­ing and the im­por­tance of ax+by+c=0 I only need to make sure (skip­ping over ad­di­tion, mul­ti­pli­ca­tion, and equal­ity) that they can dis­t­in­guish be­tween a scalar and a vari­able and then sculpt one form from an­other. Maybe it gen­er­al­izes to ar­bi­trary length and to ar­bi­trary num­ber of equa­tions as nicely as it does be­cause the uni­verse rol­led us some dumb luck, it’s un­clear to me, but I know that ev­ery time I see a flat thing in the world I can say “grab the nor­mal forms from the sa­cred texts and show me by sub­sti­tu­tion how this data is an in­stance of that”. Yes, if I had coffee I’d de­rive it from some­thing more prim­i­tive and less sa­cred, but I’ve been switch­ing to tea.

Sub­sti­tu­tion isn’t at the root of the “all the ques­tions… that can be asked in an in­tro­duc­tory course can be an­swered in an in­tro­duc­tory course” prop­erty, but I view it as the only mechanism in a typ­i­cal set of for­mal mechanisms that roots the pos­si­bil­ity of par­tic­u­lariza­tion. By which I mean the clean, el­e­gant, “all wrapped up, time for lunch” kind.

Sub­sti­tu­tions can be­have as a path from a more ab­stract level to a more con­crete level, but if you’re lucky they have types to tell us which things fit into other things.

A fun­da­men­tal­ist monk of the de­duc­tion­ist faith may never suc­cumb to the dark in­fer­ences, but they can sub­sti­tute if they be­lieve they are en­ti­tled to do so, and they can be told they are en­ti­tled by types. al­lows even our monk to say a ← 5 just as soon as they es­tab­lish that . Yes, with­out the temp­ta­tion of the dark in­fer­ences they’d never have known that the data could be mas­saged into the form and ul­ti­mately re­solved by sub­sti­tu­tion, but this is ex­actly why we’re con­sid­er­ing a monk who can’t in­duct or abduct in pub­lic, rather than an au­toma­ton who can’t in­duct or abduct at all. The closer you get to de­duc­tion­ist fun­da­men­tal­ism, the more you re­al­ize that the whole “never suc­cumb to the dark in­fer­ences” thing is more of a guideline any­way.

Prob­lems in well-un­der­stood do­mains im­pose some in­fer­en­tial cost to figure out which forms they’re an in­stance of. But that’s it. You do that, and it’s con­stant time from there, sub­sti­tu­tion only costs 2 in­fer­enceBucks, maybe 3.

Now com­pare that to ill-un­der­stood do­mains, like solu­tions to differ­en­tial equa­tions. No, not the space of “prac­ti­cal” DEs that hu­mans do for eco­nomic rea­sons, I mean ev­ery solu­tion for ev­ery DE, the space which would only be “prac­ti­cal” if armies from dozens of weird Teg­mark II’s were at­tack­ing all at once. You can see the monk step­ping back, “woah woah woah, I didn’t sign up for this. I can only do things I know how to do, I can’t go pok­ing and prod­ding in these volatile func­tion­spaces, it’s too dan­ger­ous!” You’d need some shoot-from-the-hip en­g­ineers, the kind that maybe for­get an ep­silon here and there and scratch their head for an hour be­fore they catch it, but they have to move at that pace! If you’re not will­ing to risk miss­ing, don’t shoot from the hip.

I will give two off-the-cuff “the­o­rems”, steam­rol­ling/​hand­wav­ing through two ma­jor prob­lems; 1. va­ri­ety in skill level or cog­ni­tive ca­pac­ity; 2. do­mains don’t re­ally have bor­ders, we don’t re­ally have a way to carve math into na­tions. But we won’t let ei­ther of those dis­cour­age us!

#### “The­o­rem”: in a well-un­der­stood do­main, there is a strict up­per bound on the in­fer­en­tial cost of solv­ing an ar­bi­trary prob­lem.

There is at least some so­cial com­po­nent here; on well-tread paths you’re far less likely to bother with the wrong sink/​rab­bit holes. Others in the com­mu­nity have pro­vided won­der­ful com­pres­sions for you, so you don’t have to be a big whiz­zkid. We ex­pect an up­per bound be­cause even a brute force search for your prob­lem in cor­pus of knowl­edge would ter­mi­nate in finite time, as­sum­ing the cor­pus is finite and the search doesn’t at­tempt a faulty regex. And like I said, af­ter that search (which is prob­a­bly slightly bet­ter than brute force), sub­sti­tu­tion is a con­stant num­ber of steps.

#### “Fol­lowup the­o­rem”: in an ill-un­der­stood do­main, we re­ally can’t give an up­per bound on the in­fer­en­tial cost of solv­ing an ar­bi­trary prob­lem. To play it safe, we’ll say that prob­lems cost an in­finite amount of in­fer­enceBucks un­less proven oth­er­wise.

to close where I be­gan, we know that sor­cerer’s ap­pren­tice (i.e., goal de­scrip­tion as a math­e­mat­i­cal do­main) is ill-un­der­stood be­cause “fill the cauldron with wa­ter” “oh crap, i meant max­i­mize p sub­ject to the con­straint “oh crap, one more, last one i swear… ” etc. is worst case an in­finite trap.

## I drew the core in­sight from conversation

At the end of the day, “well-un­der­stood just means sub­sti­tu­tion” was my best at­tempt at clus­ter­ing the way peo­ple talked about differ­ent as­pects of math in col­lege.

notes:

• “in­fer­enceBucks” sounds al­most like graph dis­tance when each in­fer­ence step is some edge but I was in­tend­ing more like small-step se­man­tics.

• The max­imise p=P(cauldrun full) con­strained by has re­ally weird failure modes. This is how I think it would go. Take over the world, us­ing a method that has chance of failing. Build gi­ant com­put­ers to calcu­late the ex­act chance of your takeover. Build a ran­dom bucket filler to make the prob­a­bil­ity work out. Ie if =3%, then the AI does its best to take over the world, once it suc­ceeds it calcu­lates that its plan had a 2% chance of failure. So it builds a bucket filler that has chance of work­ing. This policy leaves the chance of the cauldron be­ing filled at ex­actly 97%.