Trace: Goals and Principles

In terms of re­search, I de­cided to de­vote the month of Fe­bru­ary mainly to foun­da­tions and tools. One pro­ject was to come up with a no­ta­tion/​lan­guage/​frame­work which matches the way I’ve been think­ing about com­pu­ta­tion—i.e. DAGs with sym­me­try and “clouds” rep­re­sent­ing DAG-struc­ture-as-data. The tool I’ve been build­ing—a Python library ten­ta­tively called Trace—isn’t sta­ble enough that I want to show it off yet, but I do think I’ve nailed down the core goals and prin­ci­ples, so it’s time to write them up.

Goals

The main thing I need for my re­search is a data struc­ture suit­able for au­to­mated causal ab­strac­tion al­gorithms on ar­bi­trary com­pu­ta­tions. Some sub­goals:

  • Univer­sal­ity: data struc­ture should be able to rep­re­sent any com­pu­ta­tion performed by a program

  • Data struc­ture needs to be finite, which means lev­er­ag­ing sym­me­try to rep­re­sent in­finite com­pu­ta­tional DAGs

  • Com­pu­ta­tions must be straight­for­ward both for a hu­man to spec­ify di­rectly and for an al­gorithm to manipulate

  • Want to be able to do causal-DAG-like things, like query for par­ents/​chil­dren of a node or perform in­ter­ven­tions/​coun­ter­fac­tu­als along the lines of Pearl’s do()

  • Need to han­dle DAGs with dy­namic structure

  • Even­tu­ally I’ll want to do re­flec­tive things with this data struc­ture (i.e. self-mod­el­ing agents), so sim­plic­ity mat­ters in the speci­fi­ca­tion and core al­gorithms.

I’ll give a bit more de­tail...

The first two sub­goals ba­si­cally amount to “I need a data struc­ture rep­re­sent­ing the com­pu­ta­tion performed by an ar­bi­trary pro­gram”—i.e. the trace of an ar­bi­trary pro­gram. I do need to ac­tu­ally use the data struc­ture, so it needs to be finite. (We could use a lazy in­finite struc­ture, but then I want to know what finite data struc­ture is used to ac­tu­ally rep­re­sent the in­finite data struc­ture.) The com­pu­ta­tional DAGs of pro­grams are usu­ally in­finite but sym­met­ric, so the solu­tion prob­a­bly needs to lev­er­age sym­me­try in the rep­re­sen­ta­tion.

I (a hu­man) need to be able to spec­ify com­pu­ta­tions—so I ei­ther need an in­ter­face which would ba­si­cally be a pro­gram­ming lan­guage, or I need to tran­spile from an ex­ist­ing pro­gram­ming lan­guage. Ideally the hu­man-fac­ing rep­re­sen­ta­tion would di­rectly re­flect the com­puter-fac­ing data struc­ture, which weighs against tran­spila­tion from an ex­ist­ing lan­guage. Also, ex­ist­ing lan­guages have way more bells and whis­tles than I re­ally want to deal with for this pro­ject, even set­ting aside the likely im­por­tance of sim­plic­ity for re­flec­tion.

I want to write al­gorithms which take in one com­pu­ta­tion, chew on it, and re­turn an­other com­pu­ta­tion—that’s the type sig­na­ture of causal ab­strac­tion. To sup­port those al­gorithms, I want the data struc­ture to al­low DAG-like things. I want a data struc­ture which makes it easy to ask “what would have changed in this com­pu­ta­tion if the in­ter­nal vari­ables x, y, and z had been differ­ent?”—with­out need­ing to spec­ify ahead of time which vari­ables x, y, and z we might po­ten­tially want to fid­dle with. I want to fid­dle with any/​all of them, af­ter the fact.

I need a data struc­ture which han­dles dy­namic struc­ture. This means both dy­nam­i­cally-gen­er­ated com­pu­ta­tions (i.e. pro­grams which write pro­grams and then run them) and al­gorithms which make de­ci­sions based on the struc­ture of some other com­pu­ta­tion (i.e. an alge­braic op­ti­miza­tion al­gorithm). Th­ese ca­pa­bil­ities are the “clouds” needed to for­mu­late re­duc­tive agents in com­pu­ta­tional DAGs.

I’ve also re­al­ized that dy­namic struc­ture is very com­mon in day-to-day pro­gram­ming—it hap­pens ev­ery time we hit a con­di­tional state­ment. Without dy­namic struc­ture, a rep­re­sen­ta­tion of com­pu­ta­tion has to re­sort to hack­ish spe­cial cases to han­dle the differ­ent de­pen­dency struc­tures im­plied by branches of a con­di­tional. Directly sup­port­ing dy­namic struc­ture avoids those hacks.

Those are my im­me­di­ate needs, but I also want to sup­port gen­er­al­iza­tions of all the above. A (sim­ple) tool which is use­ful for a wide va­ri­ety of ap­pli­ca­tions is also likely to be more ro­bustly use­ful even to my in­tended ap­pli­ca­tions—it’s more likely to be use­ful longer-term as my un­der­stand­ing of the prob­lems I’m work­ing on evolves, it’s more likely to be use­ful in ways I didn’t an­ti­ci­pate, and it’s more likely to force key in­sights. To that end, I asked: “What can we do with com­pu­ta­tions, other than run them?”. This pro­duced a bunch of po­ten­tially-in­ter­est­ing use-cases to think about:

  • Query com­pu­ta­tional in­ter­me­di­ates, e.g. ex­tract­ing data from a simu­la­tion or de­bug­ging a pro­gram.

  • Prove prop­er­ties of the com­pu­ta­tion, e.g. big-O run­time or in­for­ma­tion security

  • Ma­nipu­late the com­pu­ta­tion alge­braically, e.g. to solve equa­tions or take limits

  • Ma­nipu­late the com­pu­ta­tion via in­ter­ven­tions/​coun­ter­fac­tu­als, e.g. ask what-if questions

  • Make pre­dic­tions about some parts of the com­pu­ta­tion based on par­tial in­for­ma­tion about other parts, e.g. inference

  • Embed the com­pu­ta­tion in some other com­pu­ta­tional model, e.g. tran­spila­tion or compilation

  • Mes­sage-pass­ing or dy­namic pro­gram­ming on the com­pu­ta­tional graph, e.g. be­lief prop­a­ga­tion or back­prop­a­ga­tion.

I’m not try­ing to do all of this or even most of it, but these are all use-cases in the back of my mind. In par­tic­u­lar, be­lief prop­a­ga­tion and back­prop­a­ga­tion are non­triv­ial al­gorithms which op­er­ate di­rectly on the com­pu­ta­tional DAG, so they’re nat­u­ral use-cases/​test-cases to think about. Alge­braic ma­nipu­la­tion and in­fer­ence more gen­er­ally are also good fits, al­though limited to spe­cial cases in prac­tice.

Fi­nally, one non­goal (though not an anti­goal): lan­guage en­g­ineer­ing. This tool (at least the hu­man-fac­ing in­ter­face) is sort-of-a-pro­gram­ming-lan­guage, but there are bet­ter peo­ple than me to worry about type sys­tems and er­ror han­dling and com­piler op­ti­miza­tions and all that jazz. I need the tool to be us­able for my re­search, but be­yond that I’m more in­ter­ested in key in­sights, data struc­tures and al­gorithms than in en­g­ineer­ing and lan­guage de­sign.

Principles

The usual method for rep­re­sent­ing a DAG is to give each node a (unique) name, and then as­sign a list of par­ents to each node, in or­der. When the DAG rep­re­sents com­pu­ta­tion (i.e. a cir­cuit), we also in­clude an ex­pres­sion on each node—one of a hand­ful of atomic func­tions, e.g. ad­di­tion, mul­ti­pli­ca­tion, com­par­i­son, boolean logic, and/​or con­di­tion­als. Here’s a logic cir­cuit for a full ad­der as an ex­am­ple:

{
    A: Ex­pres­sion(func­tion=in­put, par­ents=[]),
    B: Ex­pres­sion(func­tion=in­put, par­ents=[]),
    C: Ex­pres­sion(func­tion=in­put, par­ents=[]),
    S1: Ex­pres­sion(func­tion=xor, par­ents=[A, B]),
    S: Ex­pres­sion(func­tion=xor, par­ents=[S1, C]),
    Pair1: Ex­pres­sion(func­tion=and, par­ents=[A, B]),
    Pair2: Ex­pres­sion(func­tion=and, par­ents=[B, C]),
    Pair3: Ex­pres­sion(func­tion=and, par­ents=[C, A]),
    C1: Ex­pres­sion(func­tion=or, par­ents=[Pair1, Pair2]),
    C: Ex­pres­sion(func­tion=or, par­ents=[C1, Pair3])
}

This is (roughly) the data struc­ture typ­i­cally used for causal mod­els, so it’s max­i­mally con­ve­nient for causal ab­strac­tion as well as in­ter­ven­tions/​coun­ter­fac­tu­als. I’d like to keep as much of that struc­ture as I can.

The biggest prob­lem with a stan­dard DAG rep­re­sen­ta­tion is that it doesn’t lev­er­age sym­me­try. That’s in­con­ve­nient even for finite mod­els with re­peated com­po­nents, but it’s a deal-breaker for in­finite DAGs, where we need to lev­er­age sym­me­try in or­der to rep­re­sent the DAG finitely at all. If we’re talk­ing about rep­re­sent­ing com­pu­ta­tion, then most of the DAGs we care about are in­finite—for ex­am­ple, here’s a di­a­gram of the com­pu­ta­tion as­so­ci­ated with a fac­to­rial func­tion:

What do we need in or­der to com­pactly rep­re­sent re­peated com­po­nents? For starters, pre­sum­ably we only want to write out the in­ter­nal struc­ture of the com­po­nent once. As­sume we’ll use the usual DAG rep­re­sen­ta­tion for the in­ter­nal struc­ture: we give each node a name, then spec­ify what it’s read­ing from and what func­tion it performs. But that means that we’ll be re-us­ing names within differ­ent con­texts (i.e. copies of the com­po­nent) - in other words, we have some no­tion of scope.

That’s prin­ci­ple 1: stan­dard DAG rep­re­sen­ta­tion + re­peated com­po­nents → re-use sym­bol names in differ­ent con­texts.

Rep­re­sent­ing Con­text of a Symbol

In the stan­dard com­pu­ta­tional DAG rep­re­sen­ta­tion, we take a sym­bol and di­rectly look up its defin­ing ex­pres­sion. With mul­ti­ple con­texts/​scopes, this turns into two steps: look up the con­text in which the sym­bol ap­pears, then find its defin­ing ex­pres­sion within that con­text. Ad­ding in dy­namic struc­ture, the con­text it­self may be given as a sym­bol, with its own meta-con­text and its own defin­ing ex­pres­sion. What’s the sim­plest data struc­ture to sup­port all that?

Sim­plest an­swer I have so far:

  • We have a Sym­bol type. Every Sym­bol has a literal and a con­text; the con­text gives an ex­pres­sion for the literal.

  • Con­texts are rep­re­sented just like the stan­dard DAG rep­re­sen­ta­tion.

  • Sym­bols have a get_value() func­tion which gets the value of the literal within the con­text. For in­stance, Sym­bol(literal=’x’, con­text={‘x’:2}).get_value() would re­turn 2.

  • The literal and/​or con­text can also be Sym­bols, to sup­port dy­namic struc­ture.

In this no­ta­tion, our full ad­der from ear­lier would be writ­ten some­thing like:

con­text = {}
con­text.up­date({
    A: Ex­pres­sion(func­tion=in­put, par­ents=[]),
    B: Ex­pres­sion(func­tion=in­put, par­ents=[]),
    C: Ex­pres­sion(func­tion=in­put, par­ents=[]),
    S1: Ex­pres­sion(func­tion=xor, par­ents=[Sym­bol(A, con­text), Sym­bol(B, con­text)]),
    S: Ex­pres­sion(func­tion=xor, par­ents=[Sym­bol(S1, con­text), Sym­bol(C, con­text)]),
    Pair1: Ex­pres­sion(func­tion=and, par­ents=[Sym­bol(A, con­text), Sym­bol(B, con­text)]),
    Pair2: Ex­pres­sion(func­tion=and, par­ents=[Sym­bol(B, con­text), Sym­bol(C, con­text)]),
    Pair3: Ex­pres­sion(func­tion=and, par­ents=[Sym­bol(C, con­text), Sym­bol(A, con­text)]),
    C1: Ex­pres­sion(func­tion=or, par­ents=[Sym­bol(Pair1, con­text), Sym­bol(Pair2, con­text)]),
    C: Ex­pres­sion(func­tion=or, par­ents=[Sym­bol(C1, con­text), Sym­bol(Pair3, 
con­text)]),
})

Note that the sym­bols all point to the con­text in which they ap­pear. That makes it an­noy­ing to write out—we have to first ini­tial­ize the con­text, then set up all the Sym­bols to point to it.

To make things cleaner to write, I use a Con­text ob­ject: it’s just like a dict, but if it con­tains any Sym­bols with­out an ex­plicit con­text, then it points those sym­bols to it­self. This way of writ­ing things doesn’t change the un­der­ly­ing data struc­ture at all com­pared to the pre­vi­ous ex­am­ple; it just makes the “code” a bit eas­ier for hu­mans to read/​write. I also ab­bre­vi­ate Ex­pres­sion as E and Sym­bol as S. Com­bin­ing those no­ta­tional changes and mak­ing ev­ery­thing a bit less ver­bose, a com­pu­ta­tion can be writ­ten some­thing like this:

Con­text({
    A: E(in­put, []),
    B: E(in­put, []),
    C: E(in­put, []),
    S1: E(xor, [S(A), S(B)]),
    S: E(xor, [S(S1), S(C)]),
    Pair1: E(and, [S(A), S(B)]),
    Pair2: E(and, [S(B), S(C)]),
    Pair3: E(and, [S(C), S(A)]),
    C1: E(or, [S(Pair1), S(Pair2)]),
    C: E(or, [S(C1), S(Pair3)]),
})

As the say­ing goes, any suffi­ciently com­pli­cated soft­ware pro­ject con­tains an ad-hoc, in­for­mally-speci­fied, bug-rid­den, slow im­ple­men­ta­tion of half of Com­mon Lisp. Don’t use this as a pri­mary pro­gram­ming lan­guage, folks.

Rep­re­sent­ing In­ter­ven­tions and Functions

We’ve bro­ken sym­bol re­s­olu­tion up into two steps: get the con­text, and get the value of the literal within the con­text. That lets us re-use sym­bol names within differ­ent con­texts. But we still need some way to spec­ify a func­tion call—i.e. a copy of some con­text with “in­put” sym­bols set to par­tic­u­lar val­ues.

As an ex­am­ple, for a com­pu­ta­tion of fac­to­rial(3), we might want to write some­thing like this:

Con­text({
    n: 3,
    is_base_case: E(==, [S(n), 0]),
    re­curse_re­sult: S(re­sult,  <COPY OF THIS CONTEXT BUT WITH n REPLACED BY n-1>),
    non_base_re­sult: E(*, [n, S(re­curse_re­sult)])
    re­sult: E(ternary, [S(is_base_case), 1, S(non_base_re­sult)])
})

Key thing to no­tice: “copy of this con­text but with n re­placed by n-1” sounds an awful lot like an in­ter­ven­tion. We take a “model”—a con­text—and pro­duce a copy, with a change to the ex­pres­sion for one of the sym­bols.

That’s prin­ci­ple 2: func­tion calls are pro­duced by do()-style in­ter­ven­tions on a con­text.

I’ll use the stan­dard func­tion call no­ta­tion for this. Sim­ple ex­am­ple: if we take a con­text like Con­text({x: 1, y: 2, z: E(+, [x, y])}), and ap­ply the in­ter­ven­tion {y: 3}, we’d write that as

Con­text({x: 1, y: 2, z: E(+, [x, y])})({y: 3}) = Con­text({x: 1, y: 3, z: E(+, [x, y])})

For our fac­to­rial ex­am­ple, that would look some­thing like this:

fac­to­rial_con­text = Con­text({
    n: 3,
    is_base_case: E(==, [S(n), 0]),
    re­curse_re­sult: S(re­sult, fac­to­rial_con­text({n: E(-, [S(n), 1])})),
    non_base_re­sult: E(*, [n, S(re­curse_re­sult)])
    re­sult: E(ternary, [S(is_base_case), 1, S(non_base_re­sult)])
})

This is kind of tricky to write—what I in­tend to con­vey is that fac­to­rial_con­text is a poin­ter to the whole Con­text ob­ject. If we wanted to write the whole thing in Trace no­ta­tion, it would ac­tu­ally look like this:

outer_con­text = Con­text({
    fac­to­rial: Con­text({
        n: 3,
        fac­to­rial: “this string is ig­nored”,
        is_base_case: E(==, [S(n), 0]),
        re­curse_re­sult: S(re­sult, S(fac­to­rial)({n: E(-, [S(n), 1])})),
        non_base_re­sult: E(*, [n, S(re­curse_re­sult)])
        re­sult: E(ternary, [S(is_base_case), 1, S(non_base_re­sult)])
    })({fac­to­rial: S(fac­to­rial)})
})

We need a poin­ter to the fac­to­rial con­text in­side the fac­to­rial con­text it­self, so we cre­ate an outer con­text, then use an­other in­ter­ven­tion to pass our poin­ter in. Viewed as a pro­gram­ming lan­guage, Trace lacks lex­i­cal scope—an­noy­ing for a pro­gram­ming lan­guage, but prob­a­bly a good thing for a data struc­ture.

At this point, we have all the pieces of a pro­gram­ming lan­guage. To run fac­to­rial(5), we could run S(re­sult, S(fac­to­rial, outer_con­text)({n:5})).get_value() with outer_con­text defined above.

More im­por­tantly, this im­me­di­ately pro­vides a way of ac­cess­ing the pro­gram’s trace. We have, effec­tively, a lazy data struc­ture rep­re­sent­ing the trace of the pro­gram.

Let’s imag­ine trac­ing our fac­to­rial func­tion for n=3. Mix­ing our no­ta­tion with some ar­rows, our pro­gram sug­gests some­thing like this:

Key thing to no­tice: all Con­texts ex­cept the first are dy­nam­i­cally gen­er­ated, via in­ter­ven­tions. As we trace back through the data struc­ture, we’ll need to call get_value() to gen­er­ate the con­texts. So when we ac­tu­ally get the val­ues of the con­texts, we’ll end up with a data struc­ture like this:

Sud­den Stop

That’s it for core prin­ci­ples. I’m cur­rently pro­to­typ­ing some causal ab­strac­tion al­gorithms us­ing Trace, and I’m tweak­ing things as I go—in par­tic­u­lar, I may change the in­ter­ven­tion no­ta­tion to op­ti­mize for in­ter­ven­ing on the trace rather than in­ter­ven­ing on the pro­gram. We’ll see.

Once I have some proper al­gorithms run­ning and the no­ta­tion has sta­bi­lized, I’ll prob­a­bly put it on github along with some doc­u­men­ta­tion.