Inversion of theorems into definitions when generalizing

This post de­scribes a pat­tern of ab­strac­tion that is com­mon in math­e­mat­ics, which I haven’t seen de­scribed in ex­plicit terms el­se­where. I would ap­pre­ci­ate poin­t­ers to any ex­ist­ing dis­cus­sions. Also, I would ap­pre­ci­ate more ex­am­ples of this phe­nomenon, as well as cor­rec­tions and other in­sights!

Note on pre­req­ui­sites for this post: in the open­ing ex­am­ple be­low, I as­sume fa­mil­iar­ity with lin­ear alge­bra and plane ge­om­e­try, so this post prob­a­bly won’t make much sense with­out at least some su­perfi­cial knowl­edge of these sub­jects. In the sec­ond part of the post, I give a bunch of fur­ther ex­am­ples of the phe­nomenon, but these ex­am­ples are all in­de­pen­dent, so if you haven’t stud­ied a par­tic­u­lar sub­ject be­fore, that spe­cific ex­am­ple might not make sense, but you can just skip it and move on to the ones you do un­der­stand.

There is some­thing pe­cu­liar about the de­pen­dency of the fol­low­ing con­cepts in math:

  • Pythagorean theorem

  • Law of cosines

  • An­gle be­tween two vectors

  • Dot product

  • In­ner product

In the Eu­clidean ge­om­e­try of (the plane) and (three-di­men­sional space), one typ­i­cally goes through a se­ries of steps like this:

  1. Us­ing the ax­ioms of Eu­clidean ge­om­e­try (in par­tic­u­lar the par­allel pos­tu­late), we prove the Pythagorean the­o­rem.

  2. We take the right an­gle to have an­gle and calcu­late other an­gles in terms of this one.

  3. The Pythagorean the­o­rem al­lows us to prove the law of cosines (there are many proofs of the law of cosines, but this is one way to do it).

  4. Now we make the Carte­sian leap to an­a­lytic ge­om­e­try, and start treat­ing points as strings of num­bers in some co­or­di­nate sys­tem. In par­tic­u­lar, the Pythagorean the­o­rem now gives us a for­mula for the dis­tance be­tween two points, and the law of cosines can also be restated in terms of co­or­di­nates.

  5. Play­ing around with the law of cosines (stated in terms of co­or­di­nates) yields the for­mula , where and are two vec­tors and is the an­gle be­tween them (and similarly for three di­men­sions), which mo­ti­vates us to define the dot product (as be­ing pre­cisely this quan­tity).

In other words, we take an­gle and dis­tance as prim­i­tive, and de­rive the in­ner product (which is the dot product in the case of Eu­clidean spaces).

But now, con­sider what we do in (ab­stract) lin­ear alge­bra:

  1. We have a vec­tor space, which is a struc­tured space satis­fy­ing some funny ax­ioms.

  2. We define an in­ner product be­tween two vec­tors and , which again satis­fies some funny prop­er­ties.

  3. Us­ing the in­ner product, we define the length of a vec­tor as , and the dis­tance be­tween two vec­tors and as .

  4. Us­ing the in­ner product, we define the an­gle be­tween two non-zero vec­tors and as the unique num­ber satis­fy­ing .

  5. Us­ing these defi­ni­tions of length and an­gle, we can now ver­ify the Pythagorean the­o­rem and law of cosines.

In other words, we have now taken the in­ner product as prim­i­tive, and de­rived an­gle, length, and dis­tance from it.

Here is a shot at de­scribing the gen­eral phe­nomenon:

  • We start in a con­crete do­main, where we have two no­tions and , where is a defi­ni­tion and is some the­o­rem. (In the ex­am­ple above, is length/​an­gle and is the in­ner product, or rather, is the the­o­rem which states the equiv­alence of the alge­braic and ge­o­met­ric ex­pres­sions for the dot product.)

  • We find some ab­strac­tions/​gen­er­al­iza­tions of the con­crete do­main.

  • We re­al­ize that in the ab­stract set­ting, we want to talk about and , but it’s not so easy to see how to talk about them (be­cause the set­ting is so ab­stract).

  • At some point, some­one re­al­izes that in­stead of try­ing to define di­rectly (as in the con­crete case), it’s bet­ter to gen­er­al­ize/​”find the prin­ci­ples” that make tick. We fac­tor out these prin­ci­ples as ax­ioms of .

  • Fi­nally, us­ing , we can define .

  • We go back and check that in the con­crete do­main, we can do this same in­verted pro­cess.

Here is a table that sum­ma­rizes this pro­cess:

No­tion Con­crete case Gen­er­al­ized case
prim­i­tive; defined on its own terms defined in terms of
a the­o­rem defined ax­io­mat­i­cally

In what sense is this pat­tern of gen­er­al­iza­tion “al­lowed”? I don’t have a satis­fy­ing an­swer here, other than say­ing that gen­er­al­iz­ing in this par­tic­u­lar way turned out to be use­ful/​in­ter­est­ing. It seems to me that there is a large amount of trial-and-er­ror and art in­volved in pick­ing the cor­rect the­o­rem to use as the in the pro­cess. I will also say that ex­plic­itly ver­bal­iz­ing this pro­cess has made me more com­fortable about in­ner product spaces (pre­vi­ously, I just had a vague feel­ing that “some­thing is not right”).

Here are some other ex­am­ples of this sort of thing in math. In the fol­low­ing ex­am­ples, the step of us­ing to define does not take place (in this sense, the in­ner product case seems ex­cep­tional; I would greatly ap­pre­ci­ate hear­ing about more ex­am­ples like it).

  • Met­ric spaces: in Eu­clidean ge­om­e­try, the tri­an­gle in­equal­ity is a the­o­rem. But in the the­ory of met­ric spaces, the tri­an­gle in­equal­ity is taken as part of the defi­ni­tion of a met­ric.

  • Sine and co­sine: in mid­dle school, we define these func­tions in terms of an­gles and ra­tios of side lengths of a tri­an­gle. Then we can prove var­i­ous things about them, like the power se­ries ex­pan­sion. When we gen­er­al­ize to com­plex in­puts, we then take the se­ries ex­pan­sion as the defi­ni­tion.

  • Prob­a­bil­ity: in el­e­men­tary prob­a­bil­ity, we define the prob­a­bil­ity of an event as the num­ber of suc­cess­ful out­comes di­vided by the num­ber of all pos­si­ble out­comes. Then we no­tice that this defi­ni­tion satis­fies some prop­er­ties, namely: (1) the prob­a­bil­ity is always non­nega­tive; (2) if an event hap­pens for cer­tain, then its prob­a­bil­ity is ; (3) if we have some mu­tu­ally ex­clu­sive events, then we can find the prob­a­bil­ity that at least one of them hap­pens by sum­ming their in­di­vi­d­ual prob­a­bil­ities. When we gen­er­al­ize to cases where the out­comes are crazy (namely, countably or un­countably in­finite), in­stead of defin­ing prob­a­bil­ity as a ra­tio, we take the prop­er­ties (1), (2), (3) as the defi­ni­tion.

  • Con­di­tional prob­a­bil­ity: when work­ing with finite sets, we can define the con­di­tional prob­a­bil­ity as . We then see that if is the (finite) sam­ple space, we have . But now when we move to in­finite sets, we just define the con­di­tional prob­a­bil­ity as .

  • Con­ver­gence in met­ric spaces: in ba­sic real anal­y­sis in , we say that if the se­quence satis­fies some ep­silon con­di­tion (this is the defi­ni­tion). Then we can prove that if and only if . Then in more gen­eral met­ric spaces, we define “” to mean that . (Ac­tu­ally, this ex­am­ple is a lit­tle cheat­ing, since we can just take the ep­silon con­di­tion and swap in for .)

  • Differ­en­tia­bil­ity: in sin­gle-vari­able calcu­lus, we define the deriva­tive to be if this limit ex­ists. We can then prove that if and only if . This lat­ter limit is an ex­pres­sion that makes sense in the sev­eral-vari­able set­ting, and is what we use to define differ­en­tia­bil­ity.

  • Con­ti­nu­ity: in ba­sic real anal­y­sis, we define con­ti­nu­ity us­ing an ep­silon–delta con­di­tion. Later, we prove that this is equiv­a­lent to some state­ment in­volv­ing open sets. Then in gen­eral topol­ogy we take the open sets state­ment as the defi­ni­tion of con­ti­nu­ity.

  • (In­for­mal.) Arith­metic: in el­e­men­tary school ar­ith­metic, we “in­tu­itively ap­pre­hend” the ra­tio­nal num­bers. We dis­cover (as the­o­rems) that two ra­tio­nal num­bers and are equal if and only if , and that the ra­tio­nals have the ad­di­tion rule . But in the for­mal con­struc­tion of num­ber sys­tems, we define the ra­tio­nals as equiv­alence classes of pairs of in­te­gers (with sec­ond co­or­di­nate is non-zero), where iff , and define ad­di­tion on the ra­tio­nals by . Here we aren’t re­ally even gen­er­al­iz­ing any­thing, just for­mal­iz­ing our in­tu­itions.

  • (Some­what spec­u­la­tive.) Var­i­ance: if a ran­dom vari­able has a nor­mal dis­tri­bu­tion, its prob­a­bil­ity den­sity can be parametrized by two pa­ram­e­ters, and , which have in­tu­itive ap­peal (by vary­ing these pa­ram­e­ters, we can change the shape of the bell curve in pre­dictable ways). Then we find out that has the prop­erty . This mo­ti­vates us to define the var­i­ance as for other ran­dom vari­ables (which might not have such nice parametriza­tions).