Book Review: Naive Set Theory (MIRI research guide)

David_Kristoffersson14 Aug 2015 22:08 UTC

23 points

Machine Intelligence Research Institute (MIRI)Book Reviews / Media Reviews

I’m David. I’m reading through the books in the MIRI research guide and will write a review for each as I finish them. By way of inspiration from how Nate did it.

Naive Set Theory

Halmos Naive Set Theory is a classic and dense little book on axiomatic set theory, from a “naive” perspective.

Which is to say, the book won’t dig to the depths of formality or philosophy, it focuses on getting you productive with set theory. The point is to give someone who wants to dig into advanced mathematics a foundation in set theory, as set theory is a fundamental tool used in a lot of mathematics.

Summary

Is it a good book? Yes.

Would I recommend it as a starting point, if you would like to learn set theory? No. The book has a terse presentation which makes it tough to digest if you aren’t already familiar with propositional logic, perhaps set theory to some extent already and a bit of advanced mathematics in general. There are plenty of other books that can get you started there.

If you do have a somewhat fitting background, I think this should be a very competent pick to deepen your understanding of set theory. The author shows you the nuts and bolts of set theory and doesn’t waste any time doing it.

Perspective of this review

I will first refer you to Nate’s review, which I found to be a lucid take on it. I don’t want to be redundant and repeat the good points made there, so I want to focus this review on the perspective of someone with a bit weaker background in math, and try to give some help to prospective readers with parts I found tricky in the book.

What is my perspective? While I’ve always had a knack for math, I only read about 2 months of mathematics at introductory university level, and not including discrete mathematics. I do have a thorough background in software development.

Set theory has eluded me. I’ve only picked up fragments. It’s seemed very fundamental but school never gave me a good opportunity to learn it. I’ve wanted to understand it, which made it a joy to add Naive Set Theory to the top of my reading list.

How I read Naive Set Theory

Starting on Naive Set Theory, I quickly realized I wanted more meat to the explanations. What is this concept used for? How does it fit in to the larger subject of mathematics? What the heck is the author expressing here?

I supplemented heavily with wikipedia, math.stackexchange and other websites. Sometimes, I read other sources even before reading the chapter in the book. At two points, I laid down the book in order to finish two other books. The first was Gödel’s Proof, which handed me some friendly examples of propositional logic. I had started reading it on the side when I realized it was contextually useful. The second was Concepts of Modern Mathematics, which gave me much of the larger mathematical context that Naive Set Theory didn’t.

Consequently, while reading Naive Set Theory, I spent at least as much time reading other sources!

A bit into the book, I started struggling with the exercises. It simply felt like I hadn’t been given all the tools to attempt the task. So, I concluded I needed a better introduction to mathematical proofs, ordered some books on the subject, and postponed investing into the exercises in Naive Set Theory until I had gotten that introduction.

Chapters

In general, if the book doesn’t offer you enough explanation on a subject, search the Internet. Wikipedia has numerous competent articles, math.stackexchange is overflowing with content and there’s plenty additional sources available on the net. If you get stuck, do try playing around with examples of sets on paper or in a text file. That’s universal advice for math.

I’ll follow with some key points and some highlights of things that tripped me up while reading the book.

Axiom of extension

The axiom of extension tells us how to distinguish between sets: Sets are the same if they contain the same elements. Different if they do not.

Axiom of specification

The axiom of specification allows you to create subsets by using conditions. This is pretty much what is done every time set builder notation is employed.

Puzzled by the bit about Russell’s paradox at the end of the chapter? http://math.stackexchange.com/questions/651637/russells-paradox-in-naive-set-theory-by-paul-halmos

Unordered pairs

The axiom of pairs allows one to create a new set that contains the two original sets.

Unions and intersections

The axiom of unions allows one to create a new set that contains all the members of the original sets.

Complements and powers

The axiom of powers allows one to, out of one set, create a set containing all the different possible subsets of the original set.

Getting tripped up about the “for some” and “for every” notation used by Halmos? Welcome to the club:
http://math.stackexchange.com/questions/887363/axiom-of-unions-and-its-use-of-the-existential-quantifier
http://math.stackexchange.com/questions/1368073/order-of-evaluation-in-conditions-in-set-theory

Using natural language rather than logical notation is commmon practice in mathematical textbooks. You’d better get used to it:
http://math.stackexchange.com/questions/1368531/why-there-is-no-sign-of-logic-symbols-in-mathematical-texts

The existential quantifiers tripped me up a bit before I absorbed it. In math, you can freely express something like “Out of all possible x ever, give me the set of x that fulfill this condition”. In programming languages, you tend to have to be much more… specific, in your statements.

Ordered pairs

Cartesian products are used to represent plenty of mathematical concepts, notably coordinate systems.

Relations

Equivalence relations and equivalence classes are important concepts in mathematics.

Functions

Halmos is using some dated terminology and is in my eyes a bit inconsistent here. In modern usage, we have: injective, surjective, bijective and functions that are none of these. Bijective is the combination of being both injective and surjective. Replace Halmos’ “onto” with surjective, “one-to-one” with injective, and “one-to-one correspondence” with bijective.

He also confused me with his explanation of “characteristic function”—you might want to check another source there.

Families

This chapter tripped me up heavily because Halmos mixed in three things at the same time on page 36: 1. A confusing way of talking about sets. 2. Convoluted proof. 3. n-ary cartesian product.

Families are an alternative way of talking about sets. An indexed family is a set, with an index and a function in the background. A family of sets means a collection of sets, with an index and a function in the background. For Halmos build-up to n-ary cartesian products, the deal seems to be that he teases out order without explicitly using ordered pairs. Golf clap. Try this one for the math.se treatment: http://math.stackexchange.com/questions/312098/cartesian-products-and-families

Inverses and composites

The inverses Halmos defines here are more general than the inverse functions described on wikipedia. Halmos’ inverses work even when the functions are not bijective.

Numbers

The axiom of infinity states that there is a set of the natural numbers.

The Peano axioms

The peano axioms can be modeled on the the set-theoretic axioms. The recursion theorem guarantees that recursive functions exist.

Arithmetic

The principle of mathematical induction is put to heavy use in order to define arithmetic.

Order

Partial orders, total orders, well orders—are powerful mathematical concepts and are used extensively.

Some help on the way:
http://math.stackexchange.com/questions/1047409/sole-minimal-element-why-not-also-the-minimum
http://math.stackexchange.com/questions/367583/example-of-partial-order-thats-not-a-total-order-and-why
http://math.stackexchange.com/questions/225808/is-my-understanding-of-antisymmetric-and-symmetric-relations-correct
http://math.stackexchange.com/questions/160451/difference-between-supremum-and-maximum

Also, keep in mind that infinite sets like subsets of w can muck up expectations about order. For example, a totally ordered set can have multiple elements without a predecessor.

Axiom of choice

The axiom of choice lets you, from any collection of non-empty sets, select an element from every set in the collection. The axiom is necessary to do these kind of “choices” with infinite sets. In finite cases, one can construct functions for the job using the other axioms. Though, the axiom of choice often makes the job easier in finite cases so it is used where it isn’t necessary.

Zorn’s lemma

Zorn’s lemma is used in similar ways to the axiom of choice—making infinite many choices at once—which perhaps is not very strange considering ZL and AC have been proven to be equivalent.

robot-dreams offers some help in following the massive proof in the book.

Well ordering

A well-ordered set is a totally ordered set with the extra condition that every non-empty subset of it has a smallest element. This extra condition is useful when working with infinite sets.

The principle of transfinite induction means that if the presence of all strict predecessors of an element always implies the presence of the element itself, then the set must contain everything. Why does this matter? It means you can make conclusions about infinite sets beyond w, where mathematical induction isn’t sufficient.

Transfinite recursion

Transfinite recursion is an analogue to the ordinary recursion theorem, in a similar way that transfinite induction is an analogue to mathematical induction—recursive functions for infinite sets beyond w.

In modern lingo, what Halmos calls a “similarity” is an “order isomorphism”.

Ordinal numbers

The axiom of substitution is called the axiom (schema) of replacement in modern use. It’s used for extending counting beyond w.

Sets of ordinal numbers

The counting theorem states that each well ordered set is order isomorphic to a unique ordinal number.

Ordinal arithmetic

The misbehavior of commutativity in arithmetic with ordinals tells us a natural fact about ordinals: if you tack on an element in the beginning, the result will be order isomorphic to what it is without that element. If you tack on an element at the end, the set now has a last element and is thus not order isomorphic to what you started with.

The Schröder-Bernstein theorem

The Schröder-Bernstein theorem states that if X dominates Y, and Y dominates X, then X ~ Y (X and Y are equivalent).

Countable sets

Cantor’s theorem states that every set always has a smaller cardinal number than the cardinal number of its power set.

Cardinal arithmetic

Read this chapter after Cardinal numbers.

Cardinal arithmetic is an arithmetic where just about all the standard operators do nothing (beyond the finite cases).

Cardinal numbers

Read this chapter before Cardinal arithmetic.

The continuum hypothesis asserts that there is no cardinal number between that of the natural numbers and that of the reals. The generalized continuum hypothesis asserts that, for all cardinal numbers including aleph-0 and beyond aleph-0, the next cardinal number in the sequence is the power set of the previous one.

Concluding reflections

I am at the same time humbled by the subject and empowered by what I’ve learned in this episode. Mathematics is a truly vast and deep field. To build a solid foundation in proofs, I will now go through one or two books about mathematical proofs. I may return to Naive Set Theory after that. If anyone is interested, I could post my impressions of other mathematical books I read.

I think Naive Set Theory wasn’t the optimal book for me at the stage I was. And I think Naive Set Theory probably should be replaced by another introductory book on set theory in the MIRI research guide. But that’s a small complaint on an excellent document.

If you seek to get into a new field, know the prerequisites. Build your knowledge in solid steps. Which I guess, sometimes requires that you do test your limits to find out where you really are.

The next book I start on from the research guide is bound to be Computability and Logic.

What links here?

David_Kristoffersson14 Aug 2015 22:08 UTC

23 points

13 comments6 min readLW link Archive

Machine Intelligence Research Institute (MIRI)Book Reviews / Media Reviews

Vladimir_Nesov 15 Aug 2015 13:16 UTC
9 points
0
When I first worked through this book, it didn’t result in long-term retention of the material (I’m sure some people will be able to manage, just not me, not without meditating on it much longer than it takes to work through or setting up a spaced repetition system). In that respect, Enderton’s Elements of Set Theory worked much better. Enderton’s book goes into more detail, giving enough time to exercise intuition about standard proofs. At the same time, it’s an easier read, which might be helpful if Halmos’s text seems difficult.
- Tyrrell_McAllister 15 Aug 2015 15:15 UTC
  7 points
  0
  Parent
  In general, reading about the same subject from a different author is a great way to learn and retain the material better. This is true even if neither author is objectively “better” than the other. Something about recognizing the same underlying concept expressed in different words helps to fix that concept in the mind.
  
  It’s possible to exploit this phenomenon even when you have only one text to work with. One trick I use when working through a math text is to willfully use different notation in my notes next to the text. Using a different notation forces me to make sure that I’m really following the details of the argument. Expressing the same logic in different symbols makes it easier to see through those symbols to the underlying logic.
  - David_Kristoffersson 16 Aug 2015 10:23 UTC
    4 points
    0
    Parent
    The author of the Teach Yourself Logic study guide agrees with you about reading multiple sources:
    
    I very strongly recommend tackling an area of logic (or indeed any new area of mathematics) by reading a series of books which overlap in level (with the next one covering some of the same ground and then pushing on from the previous one), rather than trying to proceed by big leaps.
    
    In fact, I probably can’t stress this advice too much, which is why I am highlighting it here. For this approach will really help to reinforce and deepen understanding as you re-encounter the same material from different angles, with different emphases.
- David_Kristoffersson 16 Aug 2015 10:26 UTC
  4 points
  0
  Parent
  Thanks for the tip. Two other books on the subject that seem to be appreciated are Introduction to Set Theory by Karel Hrbacek and Classic Set Theory: For Guided Independent Study by Derek Goldrei.
  
  Edit: math.se weighs in: http://math.stackexchange.com/a/264277/255573
Toggle 15 Aug 2015 0:18 UTC
7 points
0
I finished this book about four months ago, and time is making me increasingly glad that I read it. In particular, its treatment of countable infinities, functions, proof by induction, and the Peano axioms have been worth their weight in gold. When I encounter similar subjects ‘out in the wild’, I can approach them with relative skill and trust my intuitions in a way that I couldn’t before. It’s really growing on me.

That said, as a near-introduction to set theory, it was a very difficult read at times. It was a treatment of mathematics far deeper than I had come to expect from my university courses (which were largely in continuous mathematics, according to ancient engineers’ tradition). If school has trained you to approach mathematical subjects as a tool the same way it did me, you’ll need to adjust your expectations. This book is about virtuosity, not just surveying the tools.
Andrew Quinn 23 Jun 2018 5:31 UTC
4 points
0
The inverses Halmos defines here are more general than the inverse functions described on wikipedia. Halmos’ inverses work even when the functions are not bijective.
I believe that what you are speaking of here is Halmos’s discourse on what are called these days “images and preimages” or “inverse images”. I found the subtle difference between these and inverse functions proper annoying when I was learning proof writing, so let me illustrate the concept, so that we have a caveat emptor for the budding mathematician.
Take the sets A = {0, 1} and B = {2}, and define a function f: A → B as f(x) = 2 for whatever x in A you throw in there.
Then,
- f(0) = 2, of course.
- f(1) = 2, as well.
- f(A) = {2}, which is the image of the whole set A “under” the function f.
- f^{-1}(B) = {0, 1}, which is the pre-image of the whole set of B under f. Meaning, “anything I can throw into f, from A, to get something in B”.
- f^{-1}(2) , however, is meaningless, at least as far as functions go. Functions can only return one thing, so how would you decide whether f^{-1}(2) should give you back 0 or 1?
If you say f^{-1}(2) should give back both, well, now you’re not dealing with an inverse function any more, you’re dealing with the inverse relation. You can in fact deal with that, with some other tools in the book
These are more general, which is nice, but I’ve found that in a rigorous environment it won’t do to describe them with the same language you use with functions. You really want to toy with these gently, if you can.
gjm 15 Aug 2015 14:00 UTC
1 point
0
Are you sure that by “one-to-one” Halmos means “bijective”? A more common usage is for it to mean “injective”. (But I don’t have NST and maybe he has an unusual idiom.)
- Tyrrell_McAllister 15 Aug 2015 15:24 UTC
  3 points
  0
  Parent
  There is a convention according to which a one-to-one function is injective, while a one-to-one correspondence is an injective function that is also surjective, ie, a bijection. (I don’t know whether Halmos uses this convention.)
  - gjm 15 Aug 2015 15:55 UTC
    1 point
    0
    Parent
    Oh yes, for sure, but the context here was a statement that “onto” means surjective while “one-to-one” means bijective. Definitely talking functions. And I would be really surprised if Halmos were using “one-to-one” followed by anything other than “correspondence” to mean bijective.
    - David_Kristoffersson 16 Aug 2015 9:53 UTC
      0 points
      0
      Parent
      You guys must be right. And wikipedia corroborates. I’ll edit the post. Thanks.
- stoat 15 Aug 2015 15:39 UTC
  2 points
  0
  Parent
  Looks to me like Halmos does intend “one-to-one” to mean “injective”. What he writes is “A function that always maps distinct elements onto distinct elements is called one-to-one (usually a one-to-one correspondence).” Then he mentions inclusion maps as examples of one-to-one functions.
  - David_Kristoffersson 16 Aug 2015 9:59 UTC
    0 points
    0
    Parent
    My two main sources of confusion in that sentence are:
    
    He says “distinct elements onto distinct elements”, which suggests both injection and surjection.
    He says “is called one-to-one (usually a one-to-one correspondence)”, which might suggest that “one-to-one” and “one-to-one correspondence” are synonyms—since that is what he usually uses the parantheses for when naming concepts.
    
    I find Halmos somewhat contradictory here.
    
    But I’m convinced you’re right. I’ve edited the post. Thanks.
    - ThisSpaceAvailable 19 Aug 2015 5:03 UTC
      2 points
      0
      Parent
      It is somewhat confusing, but remember that srujectivity is defined with respect to a particular codomain; a function is surjective if its range is equal to its codomain, and thus whether it’s surjective depends on what its codomain is considered to be; every function maps its domain onto its range. “f maps X onto Y” means that f is surjective with respect to Y”. So, for instance, the exponential function maps the real numbers onto the positive real numbers. It’s surjective with respect to positive real numbers*. Saying “the exponential function maps real numbers onto real numbers” would not be correct, because it’s not surjective with respect to the entire set of real numbers. So saying that a one-to-one function maps distinct elements onto a set of distinct elements can be considered to be correct, albeit not as clear as saying “to” rather than “onto”. It also suffer from a lack of clarity in that it’s not clear what the “always” is supposed to range over; are there functions that sometimes do map distinct elements to distinct elements, but sometimes don’t?