# on wellunderstoodness

*epistemic status: nothing new, informal, may or may not provide a novel compression which gives at least one person more clarity. Also, not an alignment post, just uses arguments about alignment as a motivating example, but the higher-level takeaway is extremely similar to the Rocket Alignment dialogue and to the definition of “deconfusion” that MIRI offered recently (even though I’m making a very low-level point)*

notation: please read `a ← x`

as “a replaced with x”, e.g. `(fx) x<-5 ⇒ (mx+b) x<-5 ⇒ m5+b`

. (typically the sequents literature uses `fx[5/x]`

but I find this much harder to read).

# motivation

A younger version of me once said *“What’s the big deal with sorceror’s apprentice / paperclip maximizer? you would obviously just give an and say that you want to be satisfied up to a threshold”*. But ultimately it isn’t hard to believe that software developers don’t want to be worrying about whether some hapless fool somewhere forgot to give the . Not to mention that we don’t expect the need for such thresholds to be finitely enumerable. Not to mention the need to pick the right epsilons.

The aspiration of value alignment just tells you “we want to *really understand* goal description, we want to understand it *so hard* that we make epsilons (chasing after the perfectly calibrated thresholds) obsolete”. Well, is it clear to everyone what “really understand” even means? It’s definitely not obvious, and I don’t think everyone’s on the same page about it.

# core claim

I want to focus on the following idea;

### to be mathematically understood is to be reduced to substitution instances.

Consider; linear algebra. If I want to teach someone the meaning and the importance of `ax+by+c=0`

I only need to make sure (skipping over addition, multiplication, and equality) that they can distinguish between a scalar and a variable and then sculpt one form from another. Maybe it generalizes to arbitrary length and to arbitrary number of equations as nicely as it does because the universe rolled us some dumb luck, it’s unclear to me, but I know that every time I see a flat thing in the world I can say “grab the normal forms from the sacred texts and show me by substitution how this data is an instance of that”. Yes, if I had coffee I’d derive it from something more primitive and less sacred, but I’ve been switching to tea.

Substitution isn’t at the root of the “all the questions… that can be asked in an introductory course can be answered in an introductory course” property, but I view it as the only *mechanism* in a typical set of formal mechanisms that roots the possibility of particularization. By which I mean the clean, elegant, “all wrapped up, time for lunch” kind.

Substitutions can behave as a path from a more abstract level to a more concrete level, but if you’re lucky they have *types* to tell us which things fit into other things.

A fundamentalist monk of the deductionist faith may never succumb to the dark inferences, but they can *substitute* if they believe they are entitled to do so, and they can be told they are entitled by *types*. allows even our monk to say `a ← 5`

just as soon as they establish that . Yes, without the temptation of the dark inferences they’d never have known that the data could be massaged into the form and ultimately resolved by substitution, but this is exactly why we’re considering a monk who can’t induct or abduct *in public*, rather than an automaton who can’t induct or abduct *at all*. The closer you get to deductionist fundamentalism, the more you realize that the whole “never succumb to the dark inferences” thing is more of a guideline anyway.

Problems in well-understood domains impose some inferential cost to figure out which forms they’re an instance of. But that’s it. You do that, and it’s constant time from there, substitution only costs 2 inferenceBucks, maybe 3.

Now compare that to ill-understood domains, like solutions to differential equations. No, not the space of “practical” DEs that humans do for economic reasons, I mean every solution for every DE, the space which would only be “practical” if armies from dozens of weird Tegmark II’s were attacking all at once. You can see the monk stepping back, “woah woah woah, I didn’t sign up for this. I can only do things I know how to do, I can’t go poking and prodding in these volatile functionspaces, it’s too dangerous!” You’d need some shoot-from-the-hip engineers, the kind that maybe forget an epsilon here and there and scratch their head for an hour before they catch it, but they have to move at that pace! If you’re not willing to risk missing, don’t shoot from the hip.

I will give two off-the-cuff “theorems”, steamrolling/handwaving through two major problems; 1. variety in skill level or cognitive capacity; 2. domains don’t really have borders, we don’t really have a way to carve math into nations. But we won’t let either of those discourage us!

#### “Theorem”: in a well-understood domain, there is a strict upper bound on the inferential cost of solving an arbitrary problem.

There is at least some social component here; on well-tread paths you’re far less likely to bother with the wrong sink/rabbit holes. Others in the community have provided wonderful compressions for you, so you don’t have to be a big whizzkid.
We expect an upper bound because even a brute force search for `your problem`

in `corpus of knowledge`

would terminate in finite time, assuming the corpus is finite and the search doesn’t attempt a faulty regex. And like I said, after that search (which is probably slightly better than brute force), substitution is a constant number of steps.

#### “Followup theorem”: in an ill-understood domain, we really can’t give an upper bound on the inferential cost of solving an arbitrary problem. To play it safe, we’ll say that problems cost an infinite amount of inferenceBucks unless proven otherwise.

to close where I began, we know that sorcerer’s apprentice (i.e., goal description as a mathematical domain) is ill-understood because “fill the cauldron with water” “oh crap, i meant maximize p subject to the constraint “oh crap, one more, last one i swear… ” etc. is worst case an infinite trap.

## I drew the core insight from conversation

At the end of the day, “well-understood just means substitution” was my best attempt at clustering the way people talked about different aspects of math in college.

notes:

“inferenceBucks” sounds almost like graph distance when each inference step is some edge but I was intending more like small-step semantics.

The maximise p=P(cauldrun full) constrained by p<1−ϵ has really weird failure modes. This is how I think it would go. Take over the world, using a method that has <ϵ chance of failing. Build giant computers to calculate the exact chance of your takeover. Build a random bucket filler to make the probability work out. Ie if ϵ=3%, then the AI does its best to take over the world, once it succeeds it calculates that its plan had a 2% chance of failure. So it builds a bucket filler that has 9798 chance of working. This policy leaves the chance of the cauldron being filled at exactly 97%.