Draft/​wiki: Infinities and measuring infinite sets: A quick reference

EDIT: This is now on the Wiki as “Quick refer­ence guide to the in­finite”. Do what you want with it.

It seems when­ever any­thing in­volv­ing in­fini­ties or mea­sur­ing in­finite sets comes up it gen­er­ates a lot of con­fu­sion. So I thought I would write a quick guide to both to

  1. Ad­dress com­mon confusions

  2. Act as a use­ful refer­ence (per­haps this should be a wiki ar­ti­cle? This would benefit from oth­ers be­ing able to edit it; there’s no “com­mu­nity wiki mode” on LW, huh?)

  3. Re­mind peo­ple that some­times in­vent­ing a new sort of an­swer is nec­es­sary!

I am try­ing to keep this con­cise, in some cases sub­sti­tut­ing Wikipe­dia links for ex­pla­na­tion, but I do want what I have writ­ten to be un­der­stand­able enough and in­for­ma­tive enough to an­swer the com­monly oc­cur­ring ques­tions. Please let me know if you can de­tect a par­tic­u­lar prob­lem. I wrote this very quickly and ex­pect it still needs quite a bit more work to be un­der­stand­able to some­one with very lit­tle math back­ground.

I re­al­ize many peo­ple here are fini­tists of one stripe or an­other but this comes up of­ten enough that this seems use­ful any­way. Apolo­gies to any con­struc­tivists, but I am go­ing to as­sume clas­si­cal logic, be­cause it’s all I know, though I am point­ing out ex­plic­itly any uses of choice. (For what this means and why any­one cares about this, see this com­ment.) Also as I in­tend this as a refer­ence (is there *any* way we can make this ed­itable?) some of this may be things that I do not ac­tu­ally know but merely have read.

Note that these are two sep­a­rate top­ics, though they have a bit of over­lap.

Pri­mar­ily, though, my main in­ten­tion is to put an end to the fol­low­ing, which I have seen here far too of­ten:

Myth #0: All in­fini­ties are in­finite car­di­nals, and car­di­nal­ity is the main method used to mea­sure size of sets.

The fact is that “in­finite” is a gen­eral term mean­ing “larger (in some sense) than any nat­u­ral num­ber”; differ­ent sys­tems of in­finite num­bers get used de­pend­ing on what is ap­pro­pri­ate in con­text. Fur­ther­more, there are many other meth­ods of mea­sur­ing sizes of sets, which sac­ri­fice uni­ver­sal­ity for higher re­s­olu­tion; car­di­nal­ity is a very coarse-grained mea­sure.

Topic #1: Sys­tems of in­fini­ties (for do­ing ar­ith­metic with)

Car­di­nal numbers

First, a re­view of what they rep­re­sent and how they work at the ba­sic level, be­fore we get to their ar­ith­metic.

Car­di­nal num­bers are used for mea­sur­ing sizes of sets when we don’t know, or don’t care, about the set’s con­text or com­po­si­tion. First, the stan­dard ex­pla­na­tion of what we mean by this: Say we have two farm­ers, who each have a large num­ber of sheep, more than they can count. How can they de­ter­mine who has more? They pair off the sheep of the one against the sheep of the other; whichever has sheep left over, has more.

So given two sets X and Y, we will say X has smaller car­di­nal­ity than Y (de­noted |X|≤|Y|, or some­times #X≤#Y) if there is a way to as­sign to each el­e­ment x of X, a cor­re­spond­ing el­e­ment f(x) of Y, such that no two dis­tinct x1 and x2 from X cor­re­spond to the same el­e­ment of Y. If, fur­ther­more, this cor­re­spon­dence cov­ers all of Y—if for each y in Y there is some x in X that had y as­signed to it—then we say that X and Y have the same car­di­nal­ity, |X|=|Y| or #X=#Y.

Note that by this defi­ni­tion, the set N of nat­u­ral num­bers, and the set 2N of even in­te­gers, have the same size, since we can match up 1 with 2, 2 with 4, 3 with 6, etc. This even though it seems 2N should be only “half as large” as N! This is why I em­pha­size: Car­di­nal­ity is only one way of mea­sur­ing sizes of sets, one that is not fine enough to dis­t­in­guish be­tween 2N and N. Other meth­ods of mea­sur­ing their size will have 2N only half as large as N.

It is true, but not ob­vi­ous, that if |X|≤|Y| and |Y|≤|X|, then |X|=|Y|; this is the Schroeder-Bern­stein the­o­rem. Hence we can sen­si­bly talk about “the car­di­nal­ity” of a set X as be­ing some ab­stract prop­erty of it—if |X|≤|Y| then X has smaller car­di­nal­ity and Y has larger car­di­nal­ity, and so on. We can make this more con­crete, and define an ac­tual car­di­nal­ity ob­ject |X| (or #X), us­ing ei­ther the ax­iom of choice or Scott’s trick (if you ad­mit the ax­iom of foun­da­tion) or even proper classes if we ad­mit those, but this will not be rele­vant here. We will use |X|<|Y| to mean “|X|≤|Y| but |X|≠Y”.

Note that it is also not ob­vi­ous that given any two sets X and Y, we must have ei­ther |X|≤|Y| or |Y|≤|X|; in­deed, this state­ment is true if and only if we ad­mit the ax­iom of choice. So take note:

Myth #1: In­fini­ties must come in a lin­ear or­der­ing.

Fact: If the ax­iom of choice is false, then there are nec­es­sar­ily in­finite car­di­nals which are not the same size, and yet for which nei­ther can be said to be larger! If we do ad­mit the ax­iom of choice, then the car­di­nal num­bers must be not only lin­early-or­dered but in fact be well-or­dered.

The car­di­nal­ity of the set of nat­u­ral num­bers, |N|, is also de­noted ℵ0. If we ad­mit the ax­iom of [de­pen­dent] choice, this is the small­est in­finite car­di­nal. Here by “in­finite” car­di­nal I mean one that is larger than the size of any finite set (0, 1, 2, etc.).

Quick aside on par­tial orderings

Many of you are prob­a­bly won­der­ing how to think about some­thing like “nei­ther larger nor smaller, but not the same”. For­mally, we say that, with­out choice, the or­der­ing on the car­di­nal num­bers is a par­tial or­der. Be­cause these are so com­mon I’ll go ahead and define this here—gen­er­ally, a par­tial or­der on a set S is a re­la­tion (usu­ally de­noted “≤”) on S such that:

  1. For ev­ery x in S, x≤x (re­flex­ivity)

  2. For any x and y in S, if x≤y and y≤x, then x=y (an­ti­sym­me­try)

  3. For any x,y,z in S, if x≤y and y≤z, then x≤z (tran­si­tivity)

If we ad­di­tion­ally re­quired that for any x and y in S, we have ei­ther x≤y or y≤x, we’d have a to­tal or­der (also called a lin­ear or­der).

OK, but still, what does “nei­ther larger nor smaller, yet not the same” mean in gen­eral? How can you vi­su­al­ize it? Well, the canon­i­cal ex­am­ple of a par­tial or­der would be, if we have any set S, we can par­tially or­der its sub­sets by defin­ing A≤B to mean A⊆B. So if S={1,2,3,4}, then {1,2} is larger than {1} and {2}, and smaller than {1,2,4}, but in­com­pa­rable to {3} or {2,3} or {2,3,4}.

Another ex­am­ple would be, if we have or­dered n-tu­ples of real num­bers, we could define (x1,...,xn)≤(y1,...,yn) if xi≤yi for each i. You might imag­ine these as, say, stats of char­ac­ters in a game; then x≤y would mean that char­ac­ter y is bet­ter than char­ac­ter x in ev­ery way. To say that x and y are in­com­pa­rable would mean that—though in prac­tice one might be bet­ter on the whole—nei­ther is ob­vi­ously bet­ter. More gen­er­ally, in any game, you could define a par­tial or­der on strate­gies by x≤y if y dom­i­nates x.

Note that par­tial or­ders are suffi­ciently com­mon that for many math peo­ple the word “or­der” means “par­tial or­der” by de­fault.

Car­di­nal arithmetic

Given sets X and Y, |X|+|Y| will de­note the car­di­nal­ity of the “dis­joint union” of X and Y, which is the union of X and Y, but with each el­e­ment tagged with which of the two it came from, so that we don’t lose any­thing to over­lap (i.e., if an el­e­ment is in both X and Y, it will oc­cur twice, once with an “X” tag and once with a “Y” tag.) |X||Y| will de­note the car­di­nal­ity of the set X×Y, the Carte­sian product of X and Y, which is the set of all or­dered pairs (x,y) with x in X and y in Y. How­ever, if we ad­mit the ax­iom of choice, this ar­ith­metic is not very in­ter­est­ing for in­finite sets! It turns out that given car­di­nal num­bers μ and λ, if ei­ther is in­finite and nei­ther is zero, then μ+λ=μλ=max(μ,λ). Hence, if you need a sys­tem of in­fini­ties in which x+y is go­ing to be strictly big­ger than x and y, car­di­nal num­bers are the wrong choice. (The ar­ith­metic of car­di­nals gets more in­ter­est­ing once you al­low for adding or mul­ti­ply­ing in­finitely many at once.)

There is also ex­po­nen­ti­a­tion of car­di­nals; |X||Y| de­notes the car­di­nal­ity of the set XY of all func­tions from Y to X, i.e., the num­ber of ways of pick­ing one el­e­ment of X for each el­e­ment of Y. Given any set X, 2|X| is the car­di­nal­ity of its power set ℘(X), the set of all its sub­sets. Can­tor’s di­ag­o­nal ar­gu­ment shows that for any set X, 2|X|>|X|; in par­tic­u­lar, there is no largest car­di­nal num­ber.

Ap­pli­ca­tion: Mea­sur­ing sizes of sets when we don’t care about the con­text or com­po­si­tion.

Or­di­nal numbers

I’m afraid there’s no quick way to ex­plain these. The rea­son is that they are used to rep­re­sent two things—ways of well-or­der­ing things, and po­si­tions in an “in­finite list”—ex­cept, of course, that these are ac­tu­ally fun­da­men­tally the same thing, and to un­der­stand or­di­nals you need to wrap your head around this un­til you can see both si­mul­ta­neously. Hence I sug­gest you just go read Wikipe­dia, or some other stan­dard text, if you want to learn how these work. I will just speak briefly on their ar­ith­metic. Note that the or­di­nals too are or­dered—lin­early or­dered and well-or­dered, at that.

Un­like with the car­di­nals, ad­di­tion and mul­ti­pli­ca­tion of two or­di­nals will of­ten get you a larger or­di­nal. In par­tic­u­lar, for any or­di­nal λ, λ+1 is a larger or­di­nal. How­ever the mul­ti­pli­ca­tion of or­di­nals is non­com­mu­ta­tive. In fact, even the ad­di­tion of or­di­nal num­bers is non­com­mu­ta­tive! And dis­tribu­tivity only holds on one side; a(b+c)=ab+ac, but (a+b)c need not be ac+bc. So if you need com­mu­ta­tivity, or­di­nals (with their usual op­er­a­tions) are the wrong choice.

Con­trast the small­est in­finite or­di­nal, de­noted ω, with ℵ0, which is (as­sum­ing choice) the small­est in­finite car­di­nal. 1+ℵ0=ℵ0+1=ℵ0, and 1+ω=ω, but ω+1>ω. 2ℵ0=ℵ02=ℵ0, and 2ω=ω, but ω2>ω. ℵ02=ℵ0, but ω2>ω. And in a re­ver­sal of what you might ex­pect if you just com­plete the pat­tern, 2^ℵ0>ℵ0, but 2ω=ω.

Ap­pli­ca­tion: See link.

Or­di­nal num­bers… with nat­u­ral operations

There’s an al­ter­nate way of do­ing ar­ith­metic on the or­di­nals, referred to as the “nat­u­ral op­er­a­tions”. Th­ese sac­ri­fice the con­ti­nu­ity prop­er­ties of the or­di­nary op­er­a­tions, but in re­turn get com­mu­ta­tivity, dis­tribu­tivity, can­cel­la­tion… the things we need to make the alge­bra nice. There’s a nat­u­ral ad­di­tion, a nat­u­ral mul­ti­pli­ca­tion, and ap­par­ently a nat­u­ral ex­po­nen­ti­a­tion, though I don’t know what that last one might be.

If you’ve heard “the or­di­nals em­bed in the sur­re­als”, and were very con­fused by that state­ment be­cause the sur­re­als are com­mu­ta­tive when the or­di­nals are not, the an­swer is that the cor­rect state­ment is that the or­di­nals with nat­u­ral op­er­a­tions em­bed in the sur­re­als, rather than the or­di­nals with their usual op­er­a­tions.

The ex­tended [pos­i­tive] real line

Some­times, we just use the set of non­nega­tive real num­bers with an in­finity el­e­ment (de­noted ∞, un­sur­pris­ingly) tacked on. Be­cause some­times that’s all you need. So:

Myth #2: Any place where you have in­fini­ties, you have the pos­si­bil­ity for differ­ing de­grees of in­finity.

Fact: Some­times such a thing just wouldn’t make sense.

Ap­pli­ca­tion: This is what we do in mea­sure the­ory—i.e. any­where in­te­gra­tion or ex­pected value (and hence, in the usual for­mu­la­tions, util­ity) is in­volved. If you want to claim that in your util­ity func­tion, op­tions A and B both have in­finite util­ity, but the util­ity of B is more in­finite than that of A… first you’re go­ing to have to make a frame­work in which that makes sense. Such a thing might in­deed make sense, but you’ll have to ex­plain how, as our usual frame­work for util­ity doesn’t al­low such things. (The prob­lem is that adding mul­ti­ple dis­tinct in­fini­ties tends to ruin the con­ti­nu­ity prop­er­ties of the real num­bers that make in­te­gra­tion pos­si­ble in the first place, but I’m sure if you look some­one must have come up with some method for get­ting around that in some cases.)

Some­times we al­low nega­tive num­bers and -∞ as well, though this can cause a prob­lem be­cause there’s no sen­si­ble way to define ∞+(-∞). (0∞, on the other hand, is just 0. We make this defi­ni­tion be­cause, e.g., the area of an in­finitely-long-but-in­finitely-thin line should still be 0.)

The pro­jec­tive line

Some­times we don’t even care about the dis­tinc­tion be­tween a “pos­i­tive in­finity” and a “nega­tive in­finity”; we just need some­thing that rep­re­sents some­thing larger in mag­ni­tude than all real num­bers, but which you’d ap­proach re­gard­less of whether you got large and nega­tive or large and pos­i­tive. So we take the real num­bers R, tack on an in­finity el­e­ment ∞, and we have the real pro­jec­tive line. Note that this doesn’t de­pend at all on the real num­bers be­ing or­dered, so we can do the same with the com­plex num­bers and get the com­plex pro­jec­tive line, a.k.a. the Rie­mann sphere.

Ap­pli­ca­tion: If you want to as­sign 1/​x some con­crete “value” when x=0, well, this isn’t go­ing to make sense in a sys­tem where you have to dis­t­in­guish ∞ from -∞.

Hyper­real numbers

What non­stan­dard anal­y­sis uses. Th­ese are more used as a means to de­duce prop­er­ties of the real num­bers than used for their own sake. You can’t even speak of “the” hyp­per­real num­bers, be­cause then you’d have to spec­ify what ul­tra­filter you were us­ing. Even just prov­ing these ex­ist re­quires a form of choice. You prob­a­bly don’t want to use these to rep­re­sent any­thing.

The sur­real num­bers: the in­finity kitchen sink*

For when you ab­solutely, pos­i­tively, have to make sense of an ex­pres­sion in­volv­ing in­finite quan­tities. The sur­real num­bers are pretty much as in­finite as you could pos­si­bly want. They con­tain the or­di­nals with their nat­u­ral op­er­a­tions, but they al­low for so much more. Do you need to take the nat­u­ral log­a­r­ithm of ω? And then di­vide π by it? And then raise the whole thing to the √(ω2+πω) power? And then sub­tract ω√8? In the sur­real num­bers, this all makes sense. Some­how. (And if you need square roots of nega­tive num­bers, you can always pass to the sur­com­plex num­bers, which I guess is the ac­tual kitchen sink.)

*The char­ac­ter­is­tic 0 in­finity kitchen sink, any­way. Char­ac­ter­is­tic 2 has its own in­finity kitchen sink, the nim­bers. I don’t know about other char­ac­ter­is­tics. I also have to won­der if there’s some set of char­ac­ter­is­tic 0 “in­finity kitchen sinks” that nat­u­rally ex­tend the p-adics...

Ap­pli­ca­tion: Again, kitchen sink.

...and many more

Often the thing to do is make an ad-hoc sys­tem to fit the oc­ca­sion. For in­stance, we could sim­ply take the real num­bers R and tack on an el­e­ment ∞, in­sist it obey the or­di­nary rules of alge­bra, and or­der ap­pro­pri­ately. (For­mally, take the ring R[T], and or­der lex­i­co­graph­i­cally. Then per­haps ex­tend to R(T), or what­ever else you might like. And of course call it “∞” rather than “T”.) So (∞+1)(∞-1)=∞2-1, etc. What is this good for? I have no idea, but it’s a sim­ple brute-force way of toss­ing in in­fini­ties when needed.

Also: func­tions, which are prob­a­bly more ap­pro­pri­ate a lot of the time

Let’s not for­get—of­ten­times the ap­pro­pri­ate thing to do is not to start toss­ing about in­fini­ties at all, but rather shift from think­ing about num­bers to think­ing about func­tions. You know what’s larger than any con­stant num­ber? x. What’s even larger? x2. (If we only con­sider polyno­mial func­tions, this is equiv­a­lent to the “brute-force” sys­tem above, un­der the equiv­alence x↔∞.) Much larger? ex. Is x too large? Maybe you want log x. Etc.

Topic #2: Ways of mea­sur­ing in­finite sets

The thing about mea­sur­ing in­finite sets is that we have a trade-off be­tween dis­crim­i­na­tion and ap­pli­ca­bil­ity. Car­di­nal­ity can be ap­plied to any set at all, but it’s a very coarse-grained way of mea­sur­ing things. If you want to mea­sure a sub­set of the plane, you’d be bet­ter off ask­ing for its area… just don’t think you can ask for the “area” of a set of in­te­gers.

Car­di­nal num­bers (again)

The most ba­sic method. Every set has a car­di­nal­ity. But the cost of such uni­ver­sal­ity is a very low re­s­olu­tion. The set of nat­u­ral num­bers has car­di­nal­ity ℵ0, but so does the set of even num­bers, the set of ra­tio­nal num­bers, the set of alge­braic num­bers, the set of com­putable real num­bers...

Note that the set of real num­bers is much larger and has car­di­nal­ity 2^ℵ0. (This is not to be con­fused with ℵ1, which (as­sum­ing choice again) is the sec­ond-small­est in­finite car­di­nal. The ques­tion of whether 2^ℵ0=ℵ1 is known as the con­tinuum hy­poth­e­sis.)

If we are work­ing with sub­sets T of a given set S, we can do a bit bet­ter by not just look­ing at |T|, but also at |S-T| (the size of the com­ple­ment of T in S). For in­stance, the set of nat­u­ral num­bers greater than 8, and the set of even nat­u­ral num­bers, both have car­di­nal­ity ℵ0, but within the con­text of the nat­u­ral num­bers, the former has finite com­ple­ment (num­bers at most 8), while the lat­ter has in­finite com­ple­ment (all odd num­bers).

Oc­ca­sion­ally: ordinals

If the sets you’re work­ing with come with well-or­der­ings, you can con­sider the type of well-or­der­ing as a “size”, and thus mea­sure sizes with or­di­nals. If they don’t have well-or­der­ings, this doesn’t ap­ply.

Mea­sure: the old fallback

Most com­monly we use the no­tion of a mea­sure to mea­sure sizes of sub­sets T of a given set S. This just means that we des­ig­nate some of the sub­sets T of S as “mea­surable” (with a few re­quire­ments—the whole set S must be mea­surable; com­ple­ments of mea­surable sets must be mea­surable; a union of countably many mea­surable sets must be mea­surable) and as­sign them a num­ber called their mea­sure, which I’ll de­note μ(T). μ takes val­ues in the ex­tended pos­i­tive real line (see above): It can be any non­nega­tive real num­ber, or just a flat ∞. We re­quire that the empty set have mea­sure 0, that if A and B are dis­joint sets then μ(A∪B)=μ(A)+μ(B) (called “finite ad­di­tivity”), and more gen­er­ally that if we have a countable col­lec­tion of sets A1, A2, …, with none of them over­lap­ping any of the oth­ers, then the mea­sure of their union is the sum of their mea­sures. (Called “countable ad­di­tivity”; this in­finite sum au­to­mat­i­cally makes sense be­cause all the num­bers in­volved are non­nega­tive.)

The func­tion μ it­self is called a mea­sure on S. So if we have a set S and a mea­sure on it, we have a way to mea­sure the sizes of sub­sets of it (well, the mea­surable ones, any­way). Of course, this is all very non-spe­cific; by it­self, this doesn’t help us much.

For­tu­nately, the set of real num­bers R comes equipped with a nat­u­ral mea­sure, known as Lebesgue mea­sure. So does n-di­men­sional Eu­clidean space for ev­ery n. And in­deed so do a lot of the nat­u­ral spaces we en­counter. So while sim­ply shout­ing “there’s a mea­sure!”, with­out stat­ing what that mea­sure might be, does not solve any prob­lems, in prac­tice there’s of­ten one nat­u­ral mea­sure (up to mul­ti­pli­ca­tion by some pos­i­tive con­stant). See in par­tic­u­lar: Haar mea­sure.

If we have a set S with a mea­sure μ such that μ(S)=1, then we have a prob­a­bil­ity space. This is how we for­mal­ize prob­a­bil­ity math­e­mat­i­cally: We have some set S of pos­si­bil­ities, equipped with a mea­sure, and the mea­sure of a set of pos­si­bil­ities is its prob­a­bil­ity. Ex­cept, of course, that I’m sure many here would in­sist only on finite ad­di­tivity rather than countable ad­di­tivity...

Note that if μ(S) is finite, then μ(S-T)=μ(S)-μ(T). How­ever, if μ(S)=∞, and μ(T)=∞ also, this doesn’t work; ∞-∞ is not defined in this con­text, and μ(S-T) could be any ex­tended non­nega­tive real num­ber. So note that if we’re work­ing in a set of in­finite mea­sure, and we’re com­par­ing sub­sets which them­selves have in­finite mea­sure, we can pos­si­bly gain some ex­tra in­for­ma­tion by com­par­ing the mea­sures of the com­ple­ments as well.

Here on LessWrong, when dis­cussing mul­ti­verse-based no­tions, we’ll typ­i­cally as­sume that the set of uni­verses comes equipped in some way with a nat­u­ral mea­sure. If the uni­verses are the many wor­lds of MWI, then this mea­sure will be pro­por­tional to squared-norm-of-am­pli­tude.

Mea­sur­ing sub­sets of the nat­u­ral numbers

So it seems like 2N should be half the size of N, right? Well there’s an easy way to ac­com­plish this: Given a set A of nat­u­ral num­bers, we define its nat­u­ral den­sity to be limn→∞ A(n)/​n, where A(n) de­notes the num­ber of el­e­ments of A that are at most n. At least, we can do this if the limit ex­ists. It doesn’t always. But when it does it does what we want pretty well. What if the limit doesn’t ex­ist? Well, we could use a lim­sup or a liminf in­stead, and get up­per and lower den­si­ties. Or take some other ap­proach, such as Sch­nirel­mann den­sity, where we just take an inf.

Of course, for sets of den­sity 0, this may not be enough in­for­ma­tion. Here we can pull out an­other trick from above: Don’t use num­bers, use func­tions! We can just ask what func­tion A(n) ap­prox­i­mates (asymp­tot­i­cally). For in­stance, the prime num­bers have den­sity 0, but a much more in­for­ma­tive state­ment is the prime num­ber the­o­rem, which states that if P is the set of prime num­bers, then P(n)~n/​(log n).


Of course, the real point of all these ex­am­ples was sim­ply to demon­strate: Depend­ing on what sort of thing you want to mea­sure, you’ll need differ­ent tools! So there’s many more tools out there, and some­times you may just need to in­vent your own...