Mental Context for Model Theory

I’m re­view­ing the books on the MIRI course list. After my first four book re­views I took a week off, fol­lowed up on some dan­gling ques­tions, and up­kept other side pro­jects. Then I dove into Model The­ory, by Chang and Keisler.

It has been three weeks. I have gained a de­cent foun­da­tion in model the­ory (by my own as­sess­ment), but I have not come close to com­plet­ing the text­book. There are a num­ber of other top­ics I want to touch upon be­fore De­cem­ber, so I’m putting Model The­ory aside for now. I’ll be re­vis­it­ing it in ei­ther Jan­uary or March to finish the job.

In the mean­time, I do not have a com­plete book re­view for you. In­stead, this is the first of three posts on my ex­pe­rience with model the­ory thus far.

This post will give you some fram­ing and con­text for model the­ory. I had to hop a num­ber of con­cep­tual hur­dles be­fore model the­ory started mak­ing sense — this post will con­tain some poin­t­ers that I wish I’d had three weeks ago. Th­ese tips and re­al­iza­tions are some­what gen­eral to learn­ing any logic or math; hope­fully some of you will find them use­ful.

Shortly, I’ll post a sum­mary of what I’ve learned so far. For the ca­sual reader, this may help de­mys­tify some heav­ily ad­vanced parts of the Heav­ily Ad­vanced Episte­mol­ogy se­quence (if you find it mys­te­ri­ous), and it may shed some light on some of the re­cent MIRI pa­pers. On a per­sonal note, there’s a lot I want to write down & solid­ify be­fore mov­ing on.

In fol­low-up post, I’ll dis­cuss my ex­pe­rience strug­gling to learn some­thing difficult on my own — model the­ory has re­quired sig­nifi­cantly more cog­ni­tive effort than did the pre­vi­ous text­books.

Between what was meant and what was said

Model the­ory is an ab­stract branch of math­e­mat­i­cal logic, which it­self is already too ab­stract for most. So al­low me to mo­ti­vate model the­ory a bit.

At its core, model the­ory is the study of what you said, as op­posed to what you meant. To give some in­tu­ition for this, I’ll re-tell an over­told story about an an­cient branch of math.

In olden times, Eu­clid built Geom­e­try upon five ax­ioms:

  1. You can draw a straight line seg­ment be­tween two points.

  2. You can ex­tend line seg­ments into in­finitely straight lines.

  3. You can draw a cir­cle from a straight line seg­ment, with the cen­ter at one end and ra­dius the line seg­ment.

  4. All right an­gles are con­gru­ent.

  5. If two lines are drawn which in­ter­sect a third in such a way that the sum of the in­ner an­gles on one side is less than two right an­gles, then the two lines in­evitably must in­ter­sect each other on that side if ex­tended far enough.

One of these things is not like the other. The fifth ax­iom is the only one which re­quires some effort to un­der­stand. In­tu­itively, it states that par­allel lines do not in­ter­sect. This state­ment irked Eu­clid for rea­sons apart from the ugli­ness of the ax­iom.

The fact that par­allel lines do not in­ter­sect seems like it should fol­low from the defi­ni­tion of lines and an­gles. It doesn’t seem like some­thing we should have to spec­ify in ad­di­tion. That we must as­sume par­allel lines do not in­ter­sect (rather than prov­ing it) was long seen as a wart on ge­om­e­try.

This wart irked math­e­mat­i­ci­ans for mil­len­nia, un­til fi­nally it was dis­cov­ered that the fifth ax­iom is in­de­pen­dent of the other four. You can build con­sis­tent sys­tems where par­allel lines in­ter­sect. You can build con­sis­tent sys­tems where they di­verge.

This seemed crazy, at the time: par­allel straight lines can­not di­verge! Surely, a ge­om­e­try in which they do is ab­surd!

The prob­lem is that math­e­mat­i­ci­ans were imag­in­ing “straight lines” in their head that did not match the math­e­mat­i­cal ob­jects speci­fied by the first four ax­ioms of Eu­clid.

This mis­take was in­vited by names which Eu­clid chose. “Straight lines” in­voke a men­tal image that is more spe­cific than that which the ax­ioms de­scribe. If you de­tach the provoca­tive words from the axioms

  1. You can make a LUME be­tween any two PTARS

  2. You can ex­tend a LUME into a SLUME

and so on, then it’s much eas­ier to un­der­stand that the LUMEs which Eu­clid’s ax­ioms de­scribe may not match up with the image of a “straight line” in your head. It is much eas­ier to un­der­stand that there may be in­ter­pre­ta­tions of LUME which do not obey the fifth pos­tu­late.

In fact, if you take Eu­clid’s first four pos­tu­lates, there are many pos­si­ble in­ter­pre­ta­tions in which “straight line” takes on a mul­ti­tude of mean­ings. This abil­ity to dis­con­nect the in­tended in­ter­pre­ta­tion from the available in­ter­pre­ta­tions is the bedrock of model the­ory. Model the­ory is the study of all in­ter­pre­ta­tions of a the­ory, not just the ones that the origi­nal au­thor in­tended.

Of course, model the­ory isn’t re­ally about find­ing sur­pris­ing new in­ter­pre­ta­tions — it’s much more gen­eral than that. It’s about ex­plor­ing the breadth of in­ter­pre­ta­tions that a given the­ory makes available. It’s about dis­cern­ing prop­er­ties that hold in all pos­si­ble in­ter­pre­ta­tions of a the­ory. It’s about dis­cov­er­ing how well (or poorly) a given the­ory con­strains its in­ter­pre­ta­tions. It’s a toolset used to dis­cuss in­ter­pre­ta­tions in gen­eral.

At its core, model the­ory is the study of what a math­e­mat­i­cal the­ory ac­tu­ally says, when you strip the in­tent from the sym­bols.

Iron walls

Be­fore you can do model the­ory, you have to erect iron walls be­tween four differ­ent con­cepts.

  1. Logics

  2. Languages

  3. Theories

  4. Models

Logics

A logic is a for­mal sys­tem for build­ing and ma­nipu­lat­ing sen­tences. Tra­di­tion­ally, this logic defines a num­ber of sym­bols (( ) ∧ ¬ ∀ ∃ ≡ ν ’, for ex­am­ple) and rules for build­ing sen­tences from those sym­bols.

Note that you can­not gen­er­ate sen­tences from a logic alone. Rather, you use a logic to gen­er­ate sen­tences from a lan­guage.

Also, re­mem­ber that the rules of a logic are syn­tac­tic, such as “if φ is a sen­tence then (¬φ) is a sen­tence”.

Fi­nally, re­mem­ber that log­ics are just rules for gen­er­at­ing sen­tences. A logic is perfectly happy to gen­er­ate sen­tences shaped like x∧(¬x), in spite of all your protests about con­tra­dic­tions.

Languages

A lan­guage is a col­lec­tion of sym­bols. From those sym­bols, us­ing a logic, you can start gen­er­at­ing sen­tences.

For ex­am­ple, in the propo­si­tional logic, us­ing the lan­guage {x, y}, the string hello is surely not a sen­tence (for it fails to use the ap­pro­pri­ate sym­bols). Nor is the string ¬xy a sen­tence: it fails to fol­low the rules of the logic. ((¬x)∧y) is a sen­tence, for it uses the ap­pro­pri­ate sym­bols and fol­lows the given rules.

Many re­sults in model the­ory are achieved by hold­ing the logic fixed and vary­ing the lan­guage, so it’s es­sen­tial that these con­cepts be dis­tinct in your mind.

Theories

A the­ory is a col­lec­tion of sen­tences writ­ten in one lan­guage. For ex­am­ple, in the lan­guage {≤} un­der first-or­der logic, we can dis­cuss the theory

  1. (∀x)(x≤x)

  2. (∀xy)(x≤y)∧(y≤x)→(y≡x)

  3. (∀xyz)(x≤y)∧(y≤z)→(x≤z)

which is the the­ory of or­der. (The ax­ioms above are re­flex­ivity, an­ti­sym­me­try, and tran­si­tivity).

Re­mem­ber that a the­ory is just a set of sen­tences drawn from all available sen­tences. Th­ese sen­tences aren’t par­tic­u­larly spe­cial un­less you make them spe­cial. Sen­tences like (∃x)¬(x≤x) are fine sen­tences built from the lan­guage {≤}, even though they di­rectly con­tra­dict the the­ory. The­o­ries don’t af­fect the sen­tences of a lan­guage — they’re just a grab-bag of some sen­tences that seemed in­ter­est­ing to some­one.

Models

A model is an in­ter­pre­ta­tion of the sen­tences gen­er­ated by a lan­guage. A model is a struc­ture which as­signs a truth value to each sen­tence gen­er­ated by some lan­guage un­der some logic.

(More speci­fi­cally, it’s a struc­ture that as­signs bi­nary val­ues to sen­tences in such a way that we’re jus­tified in the name “truth value”: for ex­am­ple, we re­quire that a model says φ is true if and only if it says that ¬φ is false, and so on.)

Only once we start in­ter­pret­ing sen­tences is it mean­ingful to talk about valid or re­futable sen­tences. Once you have a model of {≤} that hap­pens to say that the ax­ioms 1, 2, and 3 above are true, then you can start talk­ing about how the the­ory of or­der rules out the sen­tence (∃x)¬(x≤x) — be­cause there is no model of the the­ory of or­der which is also a model of this sen­tence.

(You can ac­tu­ally talk about how (∃x)¬(x≤x) is in­con­sis­tent with the the­ory of or­der with­out ap­peal­ing to model the­ory, but I find it helpful to treat ev­ery­thing as raw sym­bols un­til in­ter­preted by a model.)

To give a con­crete ex­am­ple, in first or­der logic, us­ing the lan­guage {S, +, *, 0}, the the­ory of ar­ith­metic is the the­ory laid out by the [Peano ax­ioms](http://​​en.wikipe­dia.org/​​wiki/​​Peano_ax­ioms#First-or­der_the­ory_of_ar­ith­metic). The ac­tual nat­u­ral num­bers zero, one, two, … are a model of this the­ory (where zero is the in­ter­pre­ta­tion of 0, one is the in­ter­pre­ta­tion of S0, etc.).

Also, it’s worth not­ing that any ob­ject that in­ter­prets sen­tences and fol­lows the rules of the logic qual­ifies as a model. There are of­ten many non-iso­mor­phic ob­jects that in­ter­pret the same sen­tences in the same way. For ex­am­ple, ra­tio­nal num­bers and real num­bers are mod­els of group the­ory that agree on ev­ery sen­tence in the lan­guage of groups, de­spite be­ing differ­ent mod­els.

Distinc­tions be­tween these four points is some­thing that seems ob­vi­ous to me in hind­sight, but I ex­plic­itly re­mem­ber ex­pend­ing cog­ni­tive effort to sep­a­rate these con­cepts men­tally, so there you go. Make sure these dis­tinc­tions are wrought in iron be­fore at­tempt­ing model the­ory.

The Right to use a name

There’s some­thing about math ed­u­ca­tion in gen­eral that has trou­bled me for quite some time, and which I’m fi­nally able to ar­tic­u­late. It’s quite pos­si­ble that this is a per­sonal nit, since no­body else seems to care — but I’ll share it any­way.

Many math text­books treat prop­er­ties that jus­tify a name of a thing as state­ments about the thing af­ter nam­ing it.

This is a lit­tle ab­stract, so I’ll make a silly ex­am­ple. Imag­ine some­one is try­ing to show that, in cat­e­gory the­ory, com­po­si­tion of ar­rows is as­so­ci­a­tive. They shouldn’t ap­peal to vi­sual in­tu­ition or any di­a­grams of ar­rows.

The con­cept that fol­low­ing ar­rows is an as­so­ci­a­tive op­er­a­tion is so in­grained in the con­cept of “ar­row” that it’s difficult to de­scribe the prop­erty in English with­out sound­ing dumb.

If you move from A to B, then move B-to-D-through-C in one step, and if I fol­low the same paths but move A-to-C-through-B in one step and then from C to D, then we will end up at the same place.

This prop­erty of ar­rows is so stupidly ob­vi­ous that the state­ment is frus­trat­ing. Fur­ther, it hides the fol­low­ing fact:

As­so­ci­a­tive com­po­si­tion be­tween thin­gies is some­thing we must have be­fore we’re jus­tified in call­ing the thin­gies “Ar­rows”.

As­so­ci­a­tive com­po­si­tion is what al­lows you to use the name “ar­row” and draw vi­sual di­a­grams. You can’t ap­peal to my in­tu­ition about “ar­rows” to show that com­po­si­tion is as­so­ci­a­tive. It’s the other way around! Only af­ter you show that your thin­gies have as­so­ci­a­tive com­po­si­tion are you al­lowed to la­bel them as “ar­rows”.

As an­other ex­am­ple, the ax­ioms of or­der (above) are what al­low us to use the sym­bol, which ap­peals to our in­tu­itive idea of or­der. Really, it’s more hon­est to say “We have a bi­nary re­la­tion R, satisfying

  1. (∀x)R(xx)

  2. (∀xy)R(xy)∧R(yx)→(y≡x)

  3. (∀xyz)R(xy)∧R(yz)→R(xz)

which jus­tifies our use of the sym­bol for R.”

I imag­ine this is not a prob­lem for ex­pe­rienced math­e­mat­i­ci­ans, for whom it goes with­out say­ing that you must for­mally spec­ify (or dis­re­gard) all in­tu­itive bag­gage that comes at­tached to the names. How­ever, I re­mem­ber dis­tinctly a num­ber of times when I gnashed my teeth with bore­dom as teach­ers made ob­vi­ous state­ments (of course is re­flex­ive, why do we even need to say this?), sim­ply be­cause I didn’t un­der­stand this idea.

I men­tion this be­cause the first few sec­tions of the Model The­ory text­book make state­ments that seem quite ob­vi­ous. It’s easy to grind your teeth and say “duh, hurry up”. It’s a lit­tle harder to un­der­stand ex­actly why such things must be said. In that light, I think this is a good piece of ad­vice for learn­ing math­e­mat­ics in gen­eral:

If you find your­self won­der­ing why a state­ment must be said, check whether the state­ment is jus­tify­ing any names.

Bind­ing meaning

The early parts of Model The­ory will go down much eas­ier if you re­al­ize that they’re bind­ing log­i­cal sym­bols to the ap­pro­pri­ate mean­ing (and thus jus­tify­ing the name “model”).

For ex­am­ple, when we state “M mod­els φ∧ψ if and only if it mod­els φ and it mod­els ψ”, it’s easy to say “well duh”. It’s a lit­tle harder to un­der­stand that this is the mechanism by which the sym­bol is bound to the in­ter­pre­ta­tion “and”.

Also, note that the abil­ity to dis­t­in­guish be­tween “the sym­bol + in the lan­guage L” from “the ad­di­tion func­tion as in­ter­preted by the model M” is ab­solutely cru­cial.

Totality

Some­thing that kept on bit­ing me was this: Models of first-or­der logic are “to­tal”. They have some­thing to say about ev­ery sen­tence in a lan­guage. Even where a the­ory is in­com­plete, any in­di­vi­d­ual model is “com­plete”. A model of first-or­der logic in­ter­prets func­tion sym­bols by to­tal func­tions and re­la­tions by set-the­o­retic re­la­tions. The re­la­tion­ship is to­tal: for ev­ery sen­tence, ei­ther M⊧φ or M⊧¬φ.

This is a point where my in­tu­itive no­tion of “mod­els as in­ter­pre­ta­tions” de­parted from the ac­tual math­e­mat­i­cal ob­jects un­der con­sid­er­a­tion — func­tions are firmly par­tial-by-de­fault in my mind’s eye.

It’s im­por­tant to hold firm the dis­tinc­tion be­tween “model” and “the­ory” here. Re­mem­ber that the num­ber the­ory is in­com­plete, while the stan­dard model of num­ber the­ory is the one that picks “true” for all Gödel sen­tences, has no in­finite num­bers, etc. (The difficul­ties in pin­point­ing such a model is ex­actly what the in­com­plete­ness the­o­rem is all about.)

Be aware that the math­e­mat­i­cal defi­ni­tion of a model may not match your in­tu­itive idea of “a struc­ture which in­ter­prets a the­ory”, es­pe­cially if you’re com­ing from com­puter sci­ence (or other con­struc­tive fields).


None of this is par­tic­u­larly novel. Rather, this is a col­lec­tion of dis­tinc­tions and clar­ifi­ca­tions that would have made my life a bit eas­ier when be­gin­ning the text­book.

In my case, I didn’t have any of these con­cepts wrong, per se — rather, I had them fuzzy. The above dis­tinc­tions were not yet fleshed out in my mind. This post pro­vides a con­text for model the­ory; a taste of the type of think­ing you must be ready to think.

I was origi­nally go­ing to use this as con­text for what I’ve learned in model the­ory so far, but this post took longer than ex­pected. I’ll fol­low up to­mor­row.