# Second-Order Logic: The Controversy

**Followup to**: Godel’s Completeness and Incompleteness Theorems

“So the question you asked me last time was, ‘Why does anyone bother with first-order logic at all, if second-order logic is so much more powerful?’”

Right. If first-order logic can’t talk about finiteness, or distinguish the size of the integers from the size of the reals, why even bother?

“The first thing to realize is that first-order theories can still have a *lot *of power. First-order arithmetic does narrow down the possible models by a lot, even if it doesn’t narrow them down to a *single *model. You can prove things like the existence of an infinite number of primes, because *every *model of the first-order axioms has an infinite number of primes. First-order arithmetic is never going to prove anything that’s *wrong *about the standard numbers. Anything that’s true in *all *models of first-order arithmetic will also be true in the *particular *model we call the standard numbers.”

Even so, if first-order theory is strictly weaker, why bother? Unless second-order logic is just as incomplete relative to third-order logic, which is weaker than fourth-order logic, which is weaker than omega-order logic -

“No, surprisingly enough—there’s tricks for making second-order logic encode any proposition in third-order logic and so on. If there’s a collection of third-order axioms that characterizes a model, there’s a collection of second-order axioms that characterizes the same model. Once you make the jump to second-order logic, you’re *done—*so far as anyone knows (so far as I know) there’s *nothing *more powerful than second-order logic in terms of *which models it can characterize*.”

Then if there’s one spoon which can eat anything, why not just use the spoon?

“Well… this gets into complex issues. There are mathematicians who don’t believe there *is *a spoon when it comes to second-order logic.”

Like there are mathematicians who don’t believe in infinity?

“Kind of. Look, suppose you *couldn’t* use second-order logic—you belonged to a species that doesn’t have second-order logic, or anything like it. Your species doesn’t have any native mental intuition you could use to construct the notion of ‘all properties’. And then suppose that, after somebody used first-order set theory to prove that first-order arithmetic had many possible models, you stood around shouting that you believed in only *one *model, what you called the *standard *model, but you couldn’t explain what made this model different from any other model—”

Well… a lot of times, even in math, we make statements that genuinely mean something, but take a while to figure out how to define. I think somebody who talked about ‘the numbers’ would mean something even before second-order logic was invented.

“But here the hypothesis is that you belong to a species that *can’t* invent second-order logic, or think in second-order logic, or anything like it.”

Then I suppose you want me to draw the conclusion that this hypothetical alien is just standing there shouting about standardness, but its words don’t mean anything because they have no way to pin down one model as opposed to another one. And I expect this species is also magically forbidden from talking about all possible subsets of a set?

“Yeah. They can’t talk about the largest powerset, just like they can’t talk about the smallest model of Peano arithmetic.”

Then you could arguably deny that shouting about the ‘standard’ numbers would mean anything, to the members of this particular species. You might as well shout about the ‘fleem’ numbers, I guess.

“Right. Even if all the members of this species *did *have a built-in sense that there was a special model of first-order arithmetic that was fleemer than any other model, if that fleem-ness wasn’t bound to anything else, it would be meaningless. Just a floating word. Or if all you could do was define fleemness as floobness and floobness as fleemness, you’d have a loop of floating words; and that might give you the impression that each particular word had a meaning, but the loop as a whole wouldn’t be connected to reality. That’s why it doesn’t help to say that the standard model of numbers is the smallest among all possible models of Peano arithmetic, if you can’t define ‘smallest possible’ any more than you can define ‘connected chain’ or ‘finite number of predecessors’.”

But second-order logic *does *seem to have consequences first-order logic doesn’t. Like, what about all that Godelian stuff? Doesn’t second-order arithmetic semantically imply… its own Godel statement? Because the unique model of second-order arithmetic doesn’t contain any number encoding a proof of a contradiction from second-order arithmetic? Wait, now I’m confused.

“No, that’s correct. It’s not paradoxical, because there’s no effective way of finding all the *semantic *implications of a collection of second-order axioms. There’s no analogue of Godel’s Completeness Theorem for second-order logic—no *syntactic* system for deriving *all *the semantic consequences. Second-order logic is *sound*, in the sense that anything syntactically provable from a set of premises, is true in any model obeying those premises. But second-order logic isn’t *complete*; there are semantic consequences you can’t derive. If you take second-order logic at face value, there’s no effectively computable way of deriving all the consequences of what you say you ‘believe’… which is a major reason some mathematicians are suspicious of second-order logic. What does it mean to believe something whose consequences you can’t derive?”

But second-order logic clearly has *some *experiential consequences first-order logic doesn’t. Suppose I build a Turing machine that looks for proofs of a contradiction from first-order Peano arithmetic. If PA’s consistency isn’t provable in PA, then by the Completeness Theorem there must exist nonstandard models of PA where this machine halts after finding a proof of a contradiction. So first-order logic doesn’t tell me that this machine runs forever—maybe it has nonstandard halting times, i.e., it runs at all standard N, but halts at −2* somewhere along a disconnected chain. Only second-order logic tells me there’s no proof of PA’s inconsistency *and therefore* this machine runs forever. Only second-order logic tells me I should *expect to see this machine keep runnin*g, and *not expect*—note falsifiability—that the machine ever halts.

“Sure, you just used a second-order theory to derive a consequence you didn’t get from a first-order theory. But that’s not the same as saying that you can *only *get that consequence using second-order logic. Maybe another first-order theory would give you the same prediction.”

Like what?

“Well, for one thing, first-order *set theory* can prove that first-order *arithmetic *has a model. Zermelo-Fraenkel set theory can prove the existence of a set such that all the first-order Peano axioms are true about that set. It follows within ZF that sound reasoning on first-order arithmetic will never prove a contradiction. And since ZF can prove that the syntax of classical logic is sound—”

What does it mean for *set theory* to prove that *logic *is sound!?

“ZF can quote formulas as structured, and encode models as sets, and then represent a finite ZF-formula which says whether or not a set of quoted formulas is true about a model. ZF can then prove that no step of classical logic goes from premises that are true inside a set-model, to premises that are false inside a set-model. In other words, ZF can represent the idea ‘formula X is semantically true in model Y’ and ‘these syntactic rules never produce semantically false statements from semantically true statements’.”

Wait, then why can’t ZF prove *itself *consistent? If ZF believes in all the axioms of ZF, and it believes that logic never produces false statements from true statements -

“Ah, but ZF can’t prove that there exists any set which is a model of ZF, so it can’t prove the ZF-axioms are consistent. ZF *shouldn’t* be able to prove that some set is a model of ZF, because that’s not true in all models. Many models of ZF don’t contain any *individual *set well-populated enough for that one set to be a model of ZF all by itself.”

I’m kind of surprised in a Godelian sense that ZF *can *contain sets as large as the universe of ZF. Doesn’t any given set have to be smaller than the whole universe?

“Inside *any particular model* of ZF, every set *within *that model is smaller than that model. But not all models of ZF are the same size; in fact, models of ZF of *every *size exist, by the Lowenheim-Skolem theorem. So you can have *some *models of ZF—some universes in which all the elements collectively obey the ZF-relations—containing individual sets which are larger than *other *entire models of ZF. A set that large is called a *Grothendieck universe* and assuming it exists is equivalent to assuming the existence of ‘strongly inaccessible cardinals’, sizes so large that you provably can’t prove inside set theory that anything that large exists.”

Whoa.

(Pause.)

But...

“But?”

I agree you’ve shown that *one *experiential consequence of second-order arithmetic—namely that a machine looking for proofs of inconsistency from PA, won’t be seen to halt—can be derived from first-order set theory. Can you get *all *the consequences of second-order arithmetic in some particular first-order theory?

“You can’t get all the *semantic *consequences of second-order logic, taken at face value, in *any *theory or *any *computable reasoning. What about the halting problem? Taken at face value, it’s a semantic consequence of second-order logic that any given Turing machine either halts or doesn’t halt—”

Personally I find it rather intuitive to imagine that a Turing machine either halts or doesn’t halt. I mean, if I’m walking up to a machine and watching it run, telling me that its halting or not-halting ‘isn’t entailed by my axioms’ strikes me as not describing any actual experience I can have with the machine. Either I’ll see it halt eventually, or I’ll see it keep running forever.

“My point is that the statements we *can *derive from the syntax of current second-order logic is limited by that syntax. And by the halting problem, we shouldn’t ever expect a computable syntax that gives us *all *the semantic consequences. There’s no possible theory you can *actually use* to get a correct advance prediction about whether an arbitrary Turing machine halts.”

Okay. I agree that no computable reasoning, on second-order logic or anything else, should be able to solve the halting problem. Unless time travel is possible, but even then, you shouldn’t be able to solve the expanded halting problem for machines that use time travel.

“Right, so the *syntax *of second-order logic can’t prove everything. And in fact, it turns out that, in terms of what you can *prove syntactically* using the standard syntax, second-order logic is identical to a many-sorted first-order logic.”

Huh?

“Suppose you have a first-order logic—one that doesn’t claim to be able to quantify over all possible predicates—which does allow the universe to contain two different sorts of things. Say, the logic uses lower-case letters for all type-1 objects and upper-case letters for all type-2 objects. Like, ‘∀x: x = x’ is a statement over all type-1 objects, and ‘∀Y: Y = Y’ is a statement over all type-2 objects. But aside from that, you use the same syntax and proof rules as before.”

Okay...

“Now add an element-relation x∈Y, saying that x is an element of Y, and add some first-order axioms for making the type-2 objects behave like collections of type-1 objects, including axioms for making sure that most describable type-2 collections exist—i.e., the collection X containing just x is guaranteed to exist, and so on. What you can *prove syntactically* in this theory will be identical to what you can prove using the standard syntax of second-order logic—even though the theory doesn’t claim that *all possible* collections of type-1s are type-2s, and the theory will have models where many ‘possible’ collections are missing from the type-2s.”

Wait, now you’re saying that second-order logic is no more powerful than first-order logic?

“I’m saying that the supposed power of second-order logic derives from *interpreting* it a particular way, and taking on faith that when you quantify over *all properties*, you’re actually talking about all properties.”

But then second-order arithmetic is no more powerful than first-order arithmetic in terms of what it can actually *prove*?

“2nd-order arithmetic is *way *more powerful than first-order arithmetic. But that’s because first-order set theory is more powerful than arithmetic, and adding the syntax of second-order logic corresponds to adding axioms with set-theoretic properties. In terms of which consequences can be *syntactically *proven, second-order arithmetic is more powerful than first-order arithmetic, but *weaker* than first-order set theory. First-order set theory can prove the existence of a model of second-order arithmetic—ZF can prove there’s a collection of numbers and sets of numbers which models a many-sorted logic with syntax corresponding to second-order logic—and so ZF can prove second-order arithmetic consistent.”

But first-order logic, including first-order set theory, can’t even *talk about* the standard numbers!

“Right, but first-order set theory can *syntactically prove *more statements about ‘numbers’ than second-order arithmetic can prove. And when you combine that with the *semantic *implications of second-order arithmetic not being computable, and with any second-order logic being syntactically identical to a many-sorted first-order logic, and first-order logic having neat properties like the Completeness Theorem… well, you can see why some mathematicians would want to give up entirely on this whole second-order business.”

But if you deny second-order logic you *can’t even say the word ‘finite’*. You would have to believe the word ‘finite’ was a *meaningless noise*.

“You’d define finiteness relative to whatever first-order model you were working in. Like, a set might be ‘finite’ only on account of the model not containing any one-to-one mapping from that set onto a smaller subset of itself—”

But that set wouldn’t *actually *be finite. There wouldn’t actually be, like, only a billion objects in there. It’s just that all the mappings which could *prove *the set was infinite would be mysteriously missing.

“According to some *other *model, maybe. But since there is no one true model—”

How is this not crazy talk along the lines of ‘there is no one true reality’? Are you saying there’s no *really *smallest set of numbers closed under succession, without all the extra infinite chains? Doesn’t talking about how these theories have multiple possible models, imply that those possible models are *logical thingies* and one of them actually *does *contain the largest powerset and the smallest integers?

“The mathematicians who deny second-order logic would see that reasoning as invoking an implicit background universe of set theory. Everything you’re saying makes sense *relative *to some *particular model* of set theory, which would contain possible models of Peano arithmetic as sets, and could look over those sets and pick out the smallest *in that model*. Similarly, that set theory could look over a proposed model for a many-sorted logic, and say whether there were any subsets *within *the larger universe which were missing from the many-sorted model. Basically, your brain is insisting that it lives inside some *particular *model of set theory. And then, from that standpoint, you could look over some *other *set theory and see how it was missing subsets that *your* theory had.”

Argh! No, damn it, I live in the set theory that *really does* have all the subsets, with no mysteriously missing subsets or mysterious extra numbers, or denumerable collections of all possible reals that could like totally map onto the integers if the mapping that did it hadn’t gone missing in the Australian outback -

“But *everybody *says that.”

Okay...

“Yeah?”

Screw set theory. I live in the *physical universe* where when you run a Turing machine, and keep watching forever *in the physical universe*, you never *experience a time* where that Turing machine outputs a proof of the inconsistency of Peano Arithmetic. Furthermore, I live in a universe where space is *actually *composed of a real field and space is *actually *infinitely divisible and contains *all *the points between A and B, rather than space only containing a denumerable number of points whose existence can be proven from the first-order axiomatization of the real numbers. So to talk about *physics—*forget about mathematics—I’ve got to use second-order logic.

“Ah. You know, that particular response is not one I have seen in the previous literature.”

Yes, well, I’m not a pure mathematician. When I ask whether I want an Artificial Intelligence to think in second-order logic or first-order logic, I wonder how that affects what the AI does in *the actual physical universe*. Here in the *actual physical universe* where times are followed by successor times, I *strongly suspect* that we should only expect to experience *standard *times, and not experience any nonstandard times. I think time is *connected*, and global connectedness is a property I can only talk about using second-order logic. I think that every *particular* time is finite, and yet time has no upper bound—that there are all finite times, but only finite times—and that’s a property I can only characterize using second-order logic.

“But if you can’t ever tell the difference between standard and nonstandard times? I mean, *local *properties of time can be described using first-order logic, and you can’t directly *see *global properties like ‘connectedness’ -”

But I *can *tell the difference. There are only nonstandard times where a proof-checking machine, running forever, outputs a proof of inconsistency from the Peano axioms. So I don’t expect to experience seeing a machine do that, since I expect to experience only standard times.

“Set theory can also prove PA consistent. If you use set theory to define time, you similarly won’t expect to see a time where PA is proven inconsistent—those nonstandard integers don’t exist in any model of ZF.”

Why should I anticipate that my physical universe is restricted to having only the nonstandard times allowed by a *more *powerful set theory, instead of nonstandard times allowed by first-order arithmetic? If I then talk about a nonstandard time where a proof-enumerating machine proves ZF inconsistent, will you say that only nonstandard times allowed by some still more powerful theory can exist? I think it’s clear that the way you’re deciding which experimental outcomes you’ll have to excuse, is by secretly assuming that *only standard times* exist regardless of which theory is required to narrow it down.

“Ah… hm. Doesn’t physics say this universe is going to run out of negentropy before you can do an infinite amount of computation? Maybe there’s only a bounded amount of time, like it stops before googolplex or something. That can be characterized by first-order theories.”

We don’t know that for certain, and I wouldn’t want to build an AI that just *assumed* lifespans had to be finite, in case it was wrong. Besides, should I use a different *logic *than if I’d found myself in Conway’s Game of Life, or something else really infinite? Wouldn’t the same sort of creatures evolve in that universe, having the same sort of math debates?

“Perhaps no universe like that *can *exist; perhaps only finite universes can exist, because only finite universes can be uniquely characterized by first-order logic.”

You just used the word ‘finite’! Furthermore, taken at face value, our own universe *doesn’t* look like it has a finite collection of entities related by first-order logical rules. Space and time both look like infinite collections of points—continuous collections, which is a second-order concept—and then to characterize the *size *of that infinity we’d need second-order logic. I mean, by the Lowenheim-Skolem theorem, there aren’t just *denumerable *models of first-order axiomatizations of the reals, there’s also *unimaginably large cardinal infinities* which obey the same premises, and that’s a possibility straight out of H. P. Lovecraft. Especially if there are any *things *hiding in the *invisible cracks of space*.”

“How could *you *tell if there were inaccessible cardinal quantities of points hidden inside a straight line? And anything that *locally *looks continuous each time you try to split it at a describable point, can be axiomatized by a first-order schema for continuity.”

That brings up another point: Who’d *really *believe that the reason Peano arithmetic works on physical times, is because that whole infinite axiom *schema *of induction, containing for every Φ a *separate *rule saying...

(Φ(0) ∧ (∀x: Φ(x) → Φ(Sx))) → (∀n: Φ(n))

...was used to specify our universe? How improbable would it be for an* infinitely long* list of rules to be true, if there wasn’t a unified reason for all of them? It seems much more likely that the *real *reason first-order PA works to describe time, is that all *properties *which are true at a starting time and true of the successor of any time where they’re true, are true of all later times; and this generalization over *properties* makes induction hold for *first-order formulas* as a special case. If my native thought process is first-order logic, I wouldn’t see the connection between each individual formula in the axiom schema—it would take separate evidence to convince me of each one—they would feel like independent mathematical facts. But after doing *scientific *induction over the fact that many *properties *true at zero, with succession preserving truth, seem to be true everywhere—after generalizing the simple, compact *second-order* theory of numbers and times—*then *you could invent an infinite first-order theory to approximate it.

“Maybe that just says you need to adjust whatever theory of scientific induction you’re using, so that it can more easily induct infinite axiom schemas.”

But why the heck would you *need *to induct infinite axiom schemas in the first place, if Reality *natively *ran on first-order logic? Isn’t it far more likely that the way we ended up with these infinite lists of axioms was that Reality was manufactured—forgive the anthropomorphism—that Reality was manufactured using an underlying schema in which time is a *connected *series of events, and space is a *continuous* field, and these are properties which happen to require second-order logic for humans to describe? I mean, if you picked out first-order theories at random, what’s the chance we’d end up inside an infinitely long axiom schema that just *happened *to look like the projection of a compact second-order theory? Aren’t we ignoring a sort of *hint*?

“A hint to what?”

Well, I’m not that sure myself, at this depth of philosophy. But I would currently say that finding ourselves in a physical universe where times have successor times, and space looks continuous, seems like a possible *hint *that the Tegmark Level IV multiverse—or the way Reality was manufactured, or whatever—might look more like *causal universes characterizable by compact second-order theories* than *causal universes characterizable by first-order theories*.

“But since any second-order theory can just as easily be *interpreted *as a many-sorted first-order theory with quantifiers that can range over either elements or sets of elements, how could using second-order syntax actually *improve *an Artificial Intelligence’s ability to handle a reality like that?”

Good question. One obvious answer is that the AI would be able to induct what *you* would call an infinite axiom schema, as a single axiom—a simple, finite hypothesis.

“There’s all *sorts *of obvious hacks to scientific induction of first-order axioms which would let you assign nonzero probability to computable infinite sequences of axioms—”

Sure. So beyond that… I would currently guess that the basic assumption behind ‘behaving as if’ second-order logic is true, says that the AI should act as if only the ‘actually smallest’ numbers will ever appear in physics, relative to some ‘true’ mathematical universe that it thinks it lives in, but knows it can’t fully characterize. Which is roughly what I’d say human mathematicians do when they take second-order logic at face value; they assume that there’s some *true *mathematical universe in the background, and that second-order logic lets them talk about it.

“And what behaviorally, experimentally distinguishes the hypothesis, ‘I live in the true ultimate math with fully populated powersets’ from the hypothesis, ‘There’s some random model of first-order set-theory axioms I happen to be living in’?”

Well… one behavioral consequence is suspecting that your time obeys an infinitely long list of first-order axioms with induction schemas for *every formula*. And then moreover believing that you’ll never experience a time when a proof-checking machine outputs a proof that Zermelo-Fraenkel set theory is inconsistent—even though there’s *provably *some models with times like that, which fit the axiom schema you just inducted. That sounds like secretly believing that there’s a background ‘true’ set of numbers that you think characterizes physical time, and that it’s the *smallest *such set. Another suspicious behavior is that as soon as you suspect Zermelo-Fraenkel set theory is consistent, you suddenly expect not to experience any *physical *time which ZF proves isn’t a standard number. You don’t think you’re in the nonstandard time of some weaker theory like Peano arithmetic. You think you’re in the minimal time expressible by *any and all theories*, so as soon as ZF can prove some number doesn’t exist in the minimal set, you think that ‘real time’ lacks such a number. All of these sound like behaviors you’d carry out if you thought there was a single ‘true’ mathematical universe that provided the best model for describing all *physical *phenomena, like time and space, which you encounter—and believing that this ‘true’ backdrop used the *largest *powersets and *smallest *numbers.

“How *exactly *do you formalize all that reasoning, there? I mean, how would you *actually *make an AI reason that way?”

Er… I’m still working on that part.

“That makes your theory a bit hard to criticize, don’t you think? Personally, I wouldn’t be surprised if any such *formalized* reasoning looked just like believing that you live inside a first-order set theory.”

I suppose I wouldn’t be *too *surprised either—it’s hard to argue with the results on many-sorted logics. But if comprehending the physical universe is best done by assuming that real phenomena are characterized by a ‘true’ mathematics containing *the *powersets and *the *natural numbers—and thus expecting that no mathematical model we can formulate will ever contain any larger powersets or smaller numbers than those of the ‘true’ backdrop to physics—then I’d call that a moral victory for second-order logic. In first-order logic we aren’t even supposed to be able to talk about such things.

“Really? To me that sounds like believing you live inside a model of first-order set theory, and believing that all models of any theories *you *can invent must *also *be sets in the larger model. You can prove the Completeness Theorem inside ZF plus the Axiom of Choice, so ZFC already proves that all consistent theories have models which are sets, although of course it can’t prove that ZFC itself is such a theory. So—anthropomorphically speaking—no model of ZFC *expects *to encounter a theory that has fewer numbers or larger powersets than itself. No model of ZFC expects to encounter any quoted-model, any set that a quoted theory entails, which contains larger powersets than the ones in its own Powerset Axiom. A first-order set theory can even make the leap from the finite statement, ‘P is true of all my subsets of X’, to believing, ‘P is true of all my subsets of X that can be described by this denumerable collection of formulas’ - it can encompass the jump from a single axiom over ‘all my subsets’, to a quoted axiom *schema *over formulas. I’d sum all that up by saying, ‘second-order logic is how first-order set theory feels from the inside’.”

Maybe. Even in the event that neither human nor superintelligent cognition will ever be able to ‘properly talk about’ unbounded finite times, global connectedness, particular infinite cardinalities, or true spatial continuity, it doesn’t follow that Reality is similarly limited in which physics it can privilege.

Part of the sequence *Highly Advanced Epistemology 101 for Beginners*

Previous post: “Godel’s Completeness and Incompleteness Theorems”

I kept expecting someone to object that “this Turing machine never halts” doesn’t count as a prediction, since you can never have observed it to run forever.

Then the statement “this Turing machine halts for every input” doesn’t count as a prediction either, because you can never have observed it for every input, even if the machine is just “halt”. And the statement “this Turing machine eventually halts” is borderline, because its negation doesn’t count as a prediction. What does this give us?

Perhaps there is a type theory for predictions, with concrete predictions like “The bus will come at 3 o’clock”, and functions that output concrete predictions like “Every monday, wednesday and friday, the bus will come at 3 o’clock” (consider the statement as a function taking a time and returning a concrete prediction yes or no).

An ultrafinitist would probably not argue with the existence of such a function, even though to someone other than an ultrafinitist, the function looks like it is quantifying over all time. From the ultrafinitist’s point of view, you’re going to apply it to some concrete time, and at that point it’s going to output a some concrete prediction.

Ultrafinitism doesn’t seem relevant here. The “type theory” you’re thinking of seems to be just the arithmetical hierarchy. A “concrete prediction” is a statement with only bounded quantifiers, “this Turing machine eventually halts” is a statement with one existential quantifier, “this Turing machine halts for every input” is a statement with one universal and one existential quantifier, and so on. Or am I missing something?

The arithmetical hierarchy is presuming a background of predicate logic; I was not presuming that. Yes, the type theory that I was gesturing towards would have some similarity to the arithmetical hierarchy.

I was trying to suggest that the answer to “what is a prediction” might look like a type theory of different variants of a prediction. Perhaps a linear hierarchy like the arithmetical hierarchy, yes, perhaps something more complicated. There could be a single starting type “concrete prediction” and a type constructor that, given source type and a destination type, gives the type of statements defining functions that take arguments of the source type and give arguments of the destination type.

The intuitive bridge for me from ultrafinitism to the question “what counts as a prediction?” is that ultrafinitists do happily work with entities like 2^1000 considered as a structure like (“^” “2” “1000″), even if they deny that those structures refer to things that are definitely numbers (of course, they can agree that they are definitely “numbers”, given an appropriately finitist definition of “number”). Maybe extreme teetotalers regarding what counts as a prediction would happily work with things such as computable functions returning predictions, even if they don’t consider them predictions.

Actually, I think “This Turing machine halts” is the more obviously troublesome one. It gives no computable expectation of when we might observe a halt. (Any computable probability distribution we would assign to halting time would put too much mass at small times as compared to the true distribution of halting times.)

If you take this objection seriously, then you should also take issue with predictions like “nobody will ever transmit information faster than the speed of light”, or things like it. After all, you can never actually observe the laws of physics to have been stable and universal for all time.

If nothing else, you can consider each as being a compact specification of an infinite sequence of testable predictions: “doesn’t halt after one step”, “doesn’t halt after two steps”,… “doesn’t halt after n steps”.

Technically speaking, you

canobserve the loop encoded in the Turing machine’s code somewhere—everynonhalting Turing machine has some kind of loop. The Halting theorems say that you cannot write down any finite program which will recognize and identifyanyinfinite loop, or deductively prove the absence thereof, in bounded time. However, human beings don’t have finite programs, and don’t work by deduction, so I suspect, with a sketch of mathematical grounding, that these problems simply don’t apply to us in the same way they apply to regular Turing machines.EDIT: To clarify, human minds aren’t “magic” or anything: the analogy between us and regular Turing machines with finite input and program tape just isn’t accurate. We’re a lot closer to inductive Turing machines or generalized Turing machines. We exhibit nonhalting behavior

by designand have more-or-less infinite input tapes.I think I would use “every” where you use “any.”

Let’s just use quantifiers.

Where P is a finite program running in finite time, M is a Turing machine, and i is an input string.

I don’t understand your notation. You have a disjunction of a formula and its negation, which is tautologically true, hence the whole statement is tautologically false.

P is a program; what meaning is intended by the predicate P(M,i)? In fact, what work is the object P doing in that formula? A Turing machine M is given an input string i; there is no other “program” involved.

No. These aren’t “loops” in any meaningful sense of the word. Failing to halt and going into an infinite loop are not the same thing.

I’d appreciate some explanation on that, to see if you’re saying something I haven’t heard before or if we’re talking past each-other. I don’t just include

`while(true);`

under “infinite loop”, I also include infinitely-expanding recursions that cannot be characterized as coinductive stepwise computations. Basically, anything that would evaluate to the`\Bot`

type in type-theory counts for “infinite loop” here, just plain computational divergence.I think that what Joshua was talking about by ‘infinite loop’ is ‘passing through the same state an infinite number of times.’ That is, a /loop/, rather than just a line with no endpoint. although this would rule out (some arbitrary-size int type) x = 0; while(true){ x++; } on a machine with infinite memory, as it would never pass through the same state twice. So maybe I’m still misunderstanding.

Ok. By that definition, yes these are the same thing. I don’t think this is a standard notion of what people mean by infinite loop though.

Formal proof needed; I in fact expect there to be something analogous to a Penrose tiling.

Moreover, to adapt Keynes’ apocryphal quote, a Turing machine can defer its loop for longer than you can ponder it.

And finally, as a general note, if you find that your proof that human beings can solve the halting problem can’t be made formal and concise, you might consider the possibility that your intuition is just plain wrong. It is in fact relevant that theoretical computer scientists seem to agree that the halting problem is not solvable by physical processes in the universe, including human beings.

I didn’t say that it

can’tbe made formal. I said that the formalism isn’t concise enough forone blog comment. Most normal journal/conference papers are about ten pages long, so that’s the standard of concision you should use for a claimed formalism in theoretical CS.And indeed, I think I can fit my formalism, when it’s done, into ten pages.

If anyone wants a go, here’s a Turing machine to try on. Does it halt?

.i=4.a=False.while not a:. a=True. for j in range(1,i):. k=i-j. b=False. for p in range(2,j):. b=b or j%p==0. for p in range(2,k):. b=b or k%p==0. a=a and b

Written in python for the convenience of coders.

Actually, the standard Turing machine has an infinite tape. And, the role of the tape in a Turing machine is equivalent to the input, output and working memory of a computer; the Turing machine’s “program” is the finite state machine.

It seems to me that the above suggests that humans are capable of cognitive tasks that Turing machines are not. But if this is true, it follows that human-level AGI is

notachievable via a computer, since computers are thought to be equivalent in power to Turing machines.First of all, swapped “induction” to “deduction”, because I’m a moron.

There’s no real semantic distinction between the original contents of the One Tape, or the finite contents of the Input Tape, or an arbitrarily complicated state-machine program, actually. You can build tape data for a Universal Turing Machine to simulate any other Turing machine.

No. You’re just making the analogy between Turing machines and real, physical computing devices overly strict. A simple Turing machine takes a finite input, has a finite program, and

eitherprocesses for finite time before accepting (possibly with something on an output tape or something like that), or runs forever.Real, physical computing devices, both biological and silicon, run coinductive (infinite-loop-until-it-isn’t) programs

all the time. Every operating system kernel, or message loop, or game loop is a coinductive program: its chief job is to iterate forever, taking a finite time to process I/O in each step. Each step performs some definite amount of semantic work, but there is an indefinite number of steps (generally: until a special “STOP NOW” input is given). To reiterate: because these programs both loop infinitely and perform I/O (with the I/O data not being computable from anything the program has when it begins running, not being “predictable” in any sense by the program), they are physically-realizable programs that simply don’t match the neat analogy of normal Turing machines.Likewise, human beings loop infinitely and perform I/O. Which more-or-less gives away half my mathematical sketch of how we solve the problem!

They match the neat analogy of Turing machines that start with possibly infinitely many non-blank squares on their tapes. All the obvious things you might like to know are still undecidable. Is this machine guaranteed to eventually read every item of its input? Is there any input that will drive the machine into a certain one of its states? Will the machine ever write a given finite string? Will it eventually have written every finite prefix of a given infinite string? Will it ever halt? These are all straightforwardly reducible to the standard halting problem.

Or perhaps you would add to the model an environment that responds to what the machine does. In that case, represent the environment by an oracle (not necessarily a computable one), and define some turn-taking mechanism for the machine to ask questions and receive answers. All of the interesting questions remain undecidable.

I agree with this:

and with this:

However, I don’t see how you are going to be able to use these facts to solve the halting problem. I’m guessing that, rather than statically examining the Turing machine, you will execute it for some period of time and study the execution, and after some finite amount of time, you expect to be able to correctly state whether or not the machine will halt. Is that the basic idea?

Basically, yes. “Halting Empiricism”, you could call it. The issue is precisely that you can’t

doempirical reasoning via deduction from a fixed axiom set (ie: a fixed, finite program`halts x : program → boolean`

). You need to do it by inductive reasoning, instead.What is inductive reasoning? Bayesian updating? Or something else?

My reply to you is the same as my reply to g_pepper: it’s easier for me to just do my background research, double-check everything, and eventually publish the full formalism than it is to explain all the details in a blog comment.

You are also correct to note that whatever combination of machine, person, input tape, and empirical data I provide, the X Machine can never solve the Halting Problem for the X Machine. The real math involved in my thinking here involves demonstrating the existence of an ordering: there should exist a sense in which some machines are “smaller” than others, and A can solve B’s halting problem when A is “strictly larger” than B, possibly strictly larger

by some necessary amount.(Of course, this already holds if A has an nth-level Turing Oracle, B has an mth-level Turing Oracle, and n > m, but that’s trivial from the definition of an oracle. I’m thinking of something that actually concerns physically realizable machines.)

Like I said: trying to go into extensive detail via blog comment will do nothing but

confusinglyunpack my intuitions about the problem, increasing net confusion. The proper thing to do isformalize, and that’s going to take a bunch of time.This is interesting. Do you have an outline of how a person would go about determining whether a Turing machine will halt? If so, I would be interested in seeing it. Alternatively, if you have a more detailed argument as to why a person will be able to determine whether an arbitrary Turing machine will halt, even if that argument does not contain the details of

howthe person would proceed, I would be interested in seeing that.Or, are you just making the argument that an intelligent person ought to be able in every case to use some combination of inductive reasoning, creativity, mathematical intuition, etc., to correctly determine whether or not a Turing machine will halt?

My reply to you is the same as my reply to Richard Kennaway: it’s easier for me to just do my background research, double-check everything, and eventually publish the full formalism than it is to explain all the details in a blog comment.

Fair enough. I am looking forward to seeing the formalism!

It’s possible to compute whether each machine halts using an inductive Turing machine like this:

Is this what you meant? If so, I’m not sure what this has to do with observing loops.

When you say that every nonhalting Turing machine has some kind of loop, do you mean the kind of loop that many halting Turing machines also contain?

No, I mean precisely the fact that it doesn’t halt. You can think of it as an infinitely growing recursion if that helps, but what’s in question is really precisely the nonhalting behavior.

I’m going to make a Discussion post about the matter, since Luke wants me to share the whole informal shpiel on the subject.

It is possible to make a Turing machine that returns “HALT”, “RUNS FOREVER” or “UNSURE” and is never wrong. If the universe as we know it runs on quantum mechanics, it should be possible to simulate a mathematician in a box with unlimited data storage and time, using a finite Turing machine. This would imply the existence of some problem that would leave the mathematician stuck forever. There are plenty of Turing machines where we have no Idea if they halt. If you describe the quantum state of the observable universe to plank length resolution, what part of humans supposedly infinite minds is not described in these 2^2^100 bits of information?

“This Turing machine won’t halt in 3^^^3 steps” is a falsifiable prediction. Replace 3^^^3 with whatever number is enough to guarantee whatever result you need.

Edit: But you’re right.

Really? First, 3^^^3 is an absurdly large number of steps, much more steps than you could feasibly go through before the heat death of the universe or whatever horrible fate awaits us. Second, after enough steps your confidence in the outcome of any computation can be made arbitrarily low because of possible errors, e.g. errors caused by cosmic rays.

I already saw your comments; I upvoted you. If this objection was made in your thread, I would have said, “Assuming ultrafinitism is wrong...”

It seemed to me that ultrafinitism wasn’t his true rejection.

Edit: But you’re right.

Okay, non-ultrafinitist objection:

But how do you know what number that is? The function “the number of steps needed to guarantee that a Turing machine won’t halt” is not a computable function of Turing machines, because if it were you could solve the halting problem. So how do you compute a function that’s, y’know, not computable?

First of all, I shouldn’t have used the number 3^^^3. That was a stupid idea, and I’ve forgotten what the notation means. So just replace that with “generic really big number”, let’s just say a googolplex (and, yes, I’m aware that’s nowhere close to 3^^^3).

To answer your question, I doubt FAI is going to care about whether Turing machine X halts; it’s going to be looking for a proof of a certain result regarding its own algorithms. It’s possible that it only needs to prove that its algorithms will work for “generic really big number” amount of steps, and thus could use a bounded quantifier instead of a universal quantifier. Using bounded quantifiers makes your system much weaker, but also moves it closer to being decidable (I mean this in a metaphorical sense; the system still won’t be decidable). Also, regarding searching for proofs, we’ll have to set an upper bound for how long we search, as well. So we need to specify some specific number in that context, too.

Specifically

whichnumbers to select is something I have no idea about; I’m not an AI researcher. Maybe I’ll have some kind of idea (but very probably not) when a more formal problem specification is given, but this article is still discussing theoreticals, not applications yet.Are you aware of the busy beaver function? Read this.

Basically, it’s impossible to write down numbers large enough for that to work.

Interesting article! What do you mean, though? Are you saying that Knuth’s triple up-arrow is uncomputable (I don’t see why that would be the case, but I could be wrong.)?

No, Solvent is making the same point as in my answer: in order to write down large enough numbers to guarantee that Turing machines don’t halt, you need to write down a function larger than the busy beaver function, and any such function fails to be computable because you can use it to solve the halting problem.

Ah, I see now, thanks!

Basically, the busy beaver function tells us the maximum number of steps that a Turing machine with a given number of states and symbols can run for. If we know the busy beaver of, for example, 5 states and 5 symbols, then we can tell you if any 5 state 5 symbol Turing machine will eventually halt.

However, you can see why it’s impossible to in general find the busy beaver function- you’d have to know which Turing machines of a given size halted, which is in general impossible.

One of the participants in this dialogue seems too concerned with pinning down models uniquely and also seems too convinced he knows what model he’s in. Suppose we live in a simulation which is being run by superbeings who have access to oracles that can tell them when Turing machines are attempting to find contradictions in PA. Whenever they detect that something in the simulation is attempting to find contradictions in PA, that part of the simulation mysteriously stops working after the billionth or trillionth step or something. Then running such Turing machines can’t tell us whether we live in a universe where PA is consistent or not.

I also wish both participants in the dialogue would take ultrafinitism more seriously. It is not as wacky as it sounds, and it seems like a good idea to be conservative about such things when designing AI.

Edit:Here is an ultrafinitist fable that might be useful or at least amusing, from the link.For what it’s worth, I’m an ultrafinitist. Since 2005, at least as far as I’ve been able to tell.

How long do you expect to stay an ultrafinitist?

Until I’m destroyed, of course!

… but since Qiaochu asked that we take ultrafinitism seriously, I’ll give a serious answer: something else will probably replace ultrafinitism as my preferred (maximum a posteriori) view of math and the world within 20 years or so. That is, I expect to determine that the question of whether ultrafinitism is true is not quite the right question to be asking, and have a better question by then, with a different best guess at the answer… just because similar changes of perspective have happened to me several times already in my life.

Is that because 2005 is as far from the present time as you dare to go?

What precisely do the Overflowers (and the mathematician reporting that anecdote) mean by the “existence” of a number?

For instance, I see that the anecdote-reporter refers to there being a series of points of a particular length, but I assume they don’t mean that in an intuitive, literal sense: there are certainly at least 2^100 Planck lengths between me and the other end of the room.

I am not sure. If I tabooed “exist,” then my best guess is that ultrafinitists would argue that statements involving really big numbers are not meaningful. For example, they might argue that such statements are not verifiable in the real world. (Edit: as another example, as I mentioned in another comment, ultrafinitists might argue that you cannot count to really big numbers.)

Yes, but just barely: 2^100 Planck lengths is something like 2 x 10^{-5} meters, so substitute 2^1000 Planck lengths, which is substantially larger than the diameter of the universe.

Seems weird to think that some of the possible configurations of bits on my 1.5TB hard drive don’t exist. Which ones? I hope none of the really good collections of pr0n are logically unreachable.

If that number does exist, then what about really big busy beaver numbers, like bb(2^10^13 )? They’re just a series of computations on hard drive contents. And that number is so close to infinity that we might as well just step from ultrafinitism to plain old finitism.

While I am not an ultrafinitist, I believe the idea is meant to be this: It is not valid to talk about those numbers, because there is no meaningful thing you can do with those numbers that can affect the real world. Therefore, the ultrafinitists say that it is not really logical to treat these numbers as “existing” as they can not affect the real world at all, and why say that something exists if it cannot affect anything at all?

This seems incredibly likely, doesn’t it? (As long as we are happy to bound ‘logically reachable’ to within the observable universe.)

Would you like to go through all of them just to be sure? How long do you think that will take you?

Trying to actually compute a sufficiently large busy beaver number, you’ll run into the problem that there won’t be enough material in the observable universe to construct the corresponding Turing machines and/or that there won’t be enough usable energy to power them for the required lengths of time and/or that the heat death of the universe will occur before the required lengths of time. If there’s no physical way to go through the relevant computations, there’s no physical sense in which the relevant computations output a result.

It may not be possible to check

allof them, but it certainly is possible to check one of them...anyone of them. And whichever one you choose to check, you’ll find that it exists. So if you claim that some of the possible configurations don’t exist, you’re claiming they’d have to be among the ones you don’t choose to check. But wait, this implies that your choice of which one(s) to check somehow affects which ones exist. It sure would be spooky if that somehow turns out to be the case, which I doubt.Exactly. And I could make my choice of which pr0n library to check—or which 1.5TB turing machine to run—dependent on 10^13 quantum coinflips; which, while it would take a while, seems physically realizable.

Help me out here…

I can imagine living a

simulation… I just don’t understand yet what you mean by living in a model in the sense of logic and model theory, because a model is a static thing. I heard someone once before talk about “what are we in?”, as though the physical universe were a model, in the sense of model theory. He wasn’t able to operationalize what he meant by it, though. So, what do you mean when you say this? Are you considering the physical universe a first-order structure) somehow? If so, how? And concerning its role as a model, what formal system are you considering it a modelof?That was imprecise, but I was trying to comment on this part of the dialogue using the language that it had established:

I was also commenting on this part:

The point I was trying to make, and maybe I did not use sensible words to make it, is that This Guy (I don’t know what his name is—who writes a dialogue with unnamed participants, by the way?) doesn’t actually know that, for two reasons: first, Peano arithmetic might actually be inconsistent, and second, even if it were consistent, there might be some mysterious force preventing us from discovering this fact.

Models being static is a matter of interpretation. It is easy to write down a first-order theory of discrete dynamical systems (sets equipped with an endomap, interpreted as a successor map which describes the state of a dynamical system at time t + 1 given its state at time t). If time is discretized, our own universe could be such a thing, and even if it isn’t, cellular automata are such things. Are these “static” or “dynamic”?

Indeed, I think it’s somewhat unclear what is meant here. The speaker attempts to relate it to physics, referring to the idea that we appear to live in continuous space… but how does the speaker propose to rule out infinitesimals and other nonstandard entities? (The speaker only seems to indicate horror about devils living in the cracks.) Or, for that matter, countable models of the reals, as someone already mentioned. This isn’t directly related to the question of what set theory is true, what set theory we live in, etc… (Perhaps the speaker’s intention in this line was to assume that we live in a Tegmark multiverse, so that we literally do live in some set theory?)

Instead, I think the speaker should have argued that we

can referto this state of affairs, not that it must be the true state of affairs. To give another example, I’m not at all convinced that time must correspond to the standard model of the natural numbers (I’m not even sure it doesn’t loop back upon itself eventually, when it comes down to it, though I agree that causal models disallow this and I find it improbable for that reason). Yet, I’m (relatively) happy to say that we can at least refer to this as a possible state of affairs. (Perhaps with a qualifier: “Unless peano arithmetic turns out to be inconsistent...”)Ah, I was asking you because I thought using that language meant you’d made sense of it ;) The language of us “living in a (model of) set theory” is something I’ve heard before (not just from you and Eliezer), which made me think I was missing something. Us living in a dynamical system makes sense, and a dynamical system can contain a model of set theory, so at least we can “live

with” models of set theory… we interact with (parts of) models of set theory when we play with collections of physical objects.Of course, time has been a fourth dimension for ages ;) My point is that set theory doesn’t seem to have a reasonable dynamical interpretation that we could

live in, and I think I’ve concluded it’s confusing to talk like that. I can only make sense of “living with” or “believing in” models.Set theory doesn’t have a dynamical interpretation because it’s not causal, but finite causal systems have first-order descriptions and infinite causal systems have second-order descriptions. Not everything logical is causal; everything causal is logical.

I like your proposal, but why not just standard finitism? What is your objection to primitive recursive arithmetic?

As a side note, I personally consider a theory “safe” if it fits with my intuitions and can be proved consistent using primitive recursive arithmetic and transfinite induction up to an ordinal I can picture. So I think of Peano arithmetic as safe, since I can picture epsilon-zero.

This isn’t my objection personally, but a sufficiently ultra finitist rejects the principle of induction.

I think it’s more that an ultrafinitist claims not to know that successor is a total function—you could still induct for as long as succession lasts. Though this is me guessing, not something I’ve read.

Either that, or they claim that certain numbers, such as 3^^^3, cannot be reached by any number of iterations of the successor function starting from 0. It’s disprovable, but without induction or cut it’s too long a proof for any computer in the universe to do in a finite human lifespan, or even the earth’s lifespan.

Alternatively, one could start by asking whether 2^50 is a real number or not, if yes go up to 2^75, if no go to 2^25, and in up to 7 steps find a real number that, when doubled, ceases to be a real number. There may be

impracticalor evennoncomputablenumbers, but continuity holds that doubling a real number always yields a real number.I think the point of the fable is that Yesenin-Volpin was counting to each number in his head before declaring whether it was ‘real’ or not, so if you asked him whether 2^50 was ‘real’ he’d just be quiet for a really really long time.

But wouldn’t that disprove ultrafinitism? All finite numbers, even 3^^^3,

canbe counted to (in the absence of any time limit, such as a lifespan), there’s just no human who reallywantsto.As I understand it, this is precisely the kind of statement that ultrafinitists do not believe.

If that’s true, then Yesenin-Volpin was just playin the role of a Turing machine trying to determine if a certain program halts (the program that counts from 0 to the input). If 3^^^3 (say) for a finitist doesn’t exists, then s/he really has a different concept of number than you have (for example, they are not axiomatized by the Peano Arithmetic). It’s fun to observe that ultrafinitism is axiomatic: if it’s a coherent point of view, it cannot prove that a certain number doesn’t exists, only assume it. I also suspect (but finite model theory is not my field at all) that they have an ‘inner’ model that mimics standard natural numbers...

Well, that’s what the anti-ultrafinitists say. It is precisely the contention of the ultrafinitists that you couldn’t “count to 3^^^3”, whatever that might mean.

Hmm.

So, it’s not sufficient to define a set of steps that determine a number… it must be possible to execute them? That’s a rather pragmatic approach. Albeit it one you’d have to keep updating if our power to compute and comprehend lengthier series of steps grows faster than you predict.

No, ultrafinitism is not a doctrine about our practical counting capacities. Ultrafinitism holds that you may not have actually denoted a number by ‘3^^^3’, because

there is no such number.Utlrafrinitists tend no to specfify the highest number, to prevent people adding one to it.

Hence “may not”

I would have done the following if I had been asked that: calculate which numbers I would have time to count up to before I was thrown out/got bored/died/earth ended/universe ran out of negentropy. I would probably have to answer I don’t know, or I think X is a number for some of them, but it’s still an answer, and until recently people could not say wether “the smallest n>2 such that there are integers a,b,c satisfying a^n + b^n = c^n” was a number or not.

I’m not advocating any kind of finitism, but I agree that the position should be taken seriously.

There is a lot of psuedo-arguing going on here. I’m a mathematician who specializes in logic and foundations, I think I can settle most of the controversy with the following summary:

1) Second-order logic is a perfectly acceptable way to formalize mathematics-in-general, stronger than Peano Arithmetic but somewhat weaker than ZFC (comparable in proof-theoretic strength to “Type Theory” of “Maclane Set Theory” or “Bounded Zermelo Set Theory”, which means sufficient for almost all mainstream mathematical research, though inadequate for results which depend on Frankel’s Replacement Axiom).

2) First-order logic has better syntactic properties than Second-order logic, in particular it satisfies Godel’s Completeness Theorem which guarantees that the syntax is adequate for the semantics. Second-order logic has better semantic properties than first-order logic, in particular avoiding nonstandard models of the integers and of set-theoretical universes V_k for various ordinals k.

3) The critics of second-order logic say that they don’t understand the concept of “finite integer” and “all subsets” and that the people using second-order logic are making meaningless noises, and all we really can be sure about is ZFC and its consequences, and since the ZFC axioms actually allow us to prove more theorems about objects we care about than the usual formalizations of second-order logic (such as, for example, the consistency of full Zermelo set theory), the fans of second-order logic should just quit and use ZFC instead

4) the fans of second-order logic reply that nothing we actually prove and translate into the language of set theory is false and our reasoning is sound by your standards so stop insulting us by claiming that since YOU can’t decide what’s true about sets, OUR notion of “property” or “subset” is vague and incoherent. We say that CH is either true of false but we don’t know which because we don’t have a deductive calculus that settles it which we are confident in, you say that CH is vague and meaningless, but this is a pseudo-problem until either you come up with new set-theoretical axioms which decide CH or we come up with a stonger deductive system which decides it. We can still prove “ZF proves Con(Z)”, we just think that the theorems we can derive in pure second-order logic are epistemologically better supported than theorems you need the full power of ZFC to prove (because they already follow from “logical” rather than “mathematical” axioms) and you shouldn’t mind us making this distinction.

5) The best theories of physics we have need real infinities and finitely many iterations of the Powerset axiom but Second Order Logic or “Type Theory” is completely adequate for them and it is hard to conceive that questions about higher infinites that require infinitely many iterations of the Powerset operation can be relevant to physics.

6) It is also hard to conceive that any form of mathematical finitism is adequate for making actual progress in theoretical physics, even if results that are proved using standard infinitary mathematics can be reformulated as finitary statements about integers and computations, these nominalistic reductions are so cumbersome that the self-imposed avoidance of infinitary ontology cripples physical insight.

(Since the average LW reader may not know whether to trust that this commenter is a mathematician specializing in logic and foundations, I remark that the above summary sounds much like the papers I’ve read on second-order logic. Though ‘pseudo-arguing’ is an odd way to describe the ancient expositional tradition of dialogue.)

If sets with cardinalities equal to these higher infinities exist as mathematical entities, then some of them will be logically dependent on us, on what we do. For example those sets may contain intelligent beings who are simulating our universe and taking different actions based on what they observe. I don’t know if this is actually the case, or if it is, how much we ought to care, but it would be nice if an FAI could consider and try to answer questions like this, and not be arbitrarily limited in its reasoning by the choice of mathematical language we make when building it.

Best mathematical comment on LessWrong in four years.

I applaud you, sir or madam.

I confess that I lost track of the reasoning about which-order-logic-can-do-what-and-why somewhere in the last post or so. I’m also not clear how and why it is important in understanding this “Highly Advanced Epistemology 101 for Beginners”. I’m wondering whether (or how) the ability to “properly talk about unbounded finite times, global connectedness, particular infinite cardinalities, or true spatial continuity” is essential or even important for (F)AI research.

Me too.

It’s the buildup to the “open problems in FAI”. Large parts of the internals of an AI look like systems for reasoning in rigorous ways about math, models, etc.

If that were the reasoning, it’d be nice if he came out and explained why he believes that to be the case. Becuase just about any A(G)I researcher would take issue with that statement...

Maybe we need a handy summary table of the which’s, what’s, and why’s...

I assume we need to ask some of these questions in order to decide if, or in what sense, an AGI needs second-order logic.

This and previous articles in this series emphasize attaching meaning to sequences of symbols via discussion and gesturing toward models. That strategy doesn’t seem compatible with your article regarding sheep and pebbles. Isn’t there a way to connect sequences of symbols to sheep (and food and similar real-world consequences) directly via a discipline of “symbolcraft”?

I thought pebblecraft was an excellent description of how finitists and formalists think about confusing concepts like uncountability or actual infinity: Writing down ”… is uncountably infinite …” is part of our pebblecraft, and we have some confidence that when we get to the bottom line of the formal argument, we’ll be able to interpret the results “in the usual way”. The fact that someone (perhaps an alien) might watch our pebblecraft and construe us (in medias res) to be referring to (abstract, mathematical) things other than the (abstract, mathematical) things that we usually construe ourselves to be referring to doesn’t stop the pebblecraft magic from working.

I think I agree. This gives me the feeling that every time the discussion revolves around models, it is getting off the point. We can touch proof systems. We can’t touch models.

We

cansay that models are part of our pebblecraft...The question, though, is whether we should trust particular parts of our pebblecraft. Should we prefer to work with first-order logic, or 2nd-order logic? Should we believe in such a thing as a standard model of the natural numbers? Should we trust proofs which make use of those concepts?

Feeling our way into a new formal system is part of our (messy, informal) pebblecraft. Sometimes people propose formal systems starting with their intended semantics (roughly, model theory). Sometimes people propose formal systems starting with introduction and elimination rules (roughly, proof theory). If the latter, people sometimes look for a semantics to go along with the syntax (and vice versa, of course).

For example, lambda calculus started with rules for performing beta reduction. In talking about lambda calculus, people refer to it as “functions from functions to functions”. Mathematicians knew that there was no nontrivial set S with a bijection to the set of all functions from S to S. So something else must be going on. Dana Scott invented domain theory partially to solve this problem. Domains have additional structure, such that some domains D do have bijections with the set of all

structure-preservingmaps from D to D. http://en.wikipedia.org/wiki/Domain_theory Similarly, modal logics started with just proof theory, and then Saul Kripke invented a semantics for them. http://en.wikipedia.org/wiki/Kripke_semanticsThere’s always a syntactical model which you can construct mechanically from the proof theory (at least, that’s my understanding of the downward Lowenheim-Skolem argument) - but Scott and Kripke were not satisfied with that syntactical model, and went looking for something else, more insightful and perspicuous. Adding a “semantic” understanding can increase our (informal) confidence in a formal system—even a formal system that we were already working with. I’m not sure why it helps, but I think it’s part of our pebblecraft that it does help.

Perhaps adding a “semantic” understanding is like another bridge between informal and formal reasoning. These bridges are only partly formal—they’re also partly informal, the concepts and gesturing around the equations and proofs. Having one bridge is sufficient to go onto the Island of Formality, do some stuff, and come off again, but might be more convenient to have two, or three.

Nice. This seems like a surprisingly well-motivated way of reducing everything to physics: there’s just “syntactic” machinery made out of physics, and any semantics that might be attributed to parts of this machinery is merely a partially informal device (i.e. a collection of cognitive skills) that human mathematicians might use as an aid for reasoning about the machinery. Even when the machinery itself might in some sense be said to be semantically reasoning about something or other, this description of the machinery can be traced back to human mathematicians who use it as a partially informal device for understanding the machinery, and so it won’t strictly speaking be a property of the machinery itself.

In other words, in this view semantics is an informal art primarily concerned with advancement of

humanunderstanding, and it’s not fundamental to the operation of intelligence in general, it’s not needed for properly designing things, responding to observations or making decisions, any more than curiosity or visual thinking.Ok, right. I think I didn’t fully appreciate your point before. So the fact that a particular many-sorted first-order logic with extra axioms is proof-theoretically equivalent to (a given proof system for) 2nd-order logic should make me stop asking whether we should build machines with one or the other, and start asking instead whether the many-sorted logic with extra axioms is any better than plain first-order logic (to which the answer appears to be yes, based on our admittedly 2nd-order reasoning). Right?

The prevalence of encodings means that we might not be able to “build machines with one or the other”. That is, given that there are basic alternatives A and B and A can encode B and B can encode A, it would take a technologist specializing in hair-splitting to say whether a machine that purportedly is using A is “really” using A at its lowest level or whether it is “actually” using B and only seems to use A via an encoding.

If in the immediate term you want to work with many-sorted first order logic, a reasonable first step would be to implement single-sorted first order logic, and a preprocessor to translate the many-sorted syntax into the single-sorted syntax. Then further development, optimizations and so on, might complicate the story, compacting the preprocessor and single-sorted engine into a single technology that you might reasonably say implements many-sorted logic “natively”.

The prevalence of encodings means that apparently different foundations don’t always make a difference.

In this case, that’s not true. The many-sorted logic, with axioms and all to emulate 2nd-order logic, has different properties than plain 1st-order logic (even though we may be emulating it in a plain 1st-order engine).

For example, in 2nd-order logic, we can quantify over any names we use. In 1st-order logic, this is not true: we can quantify over 1st-order entities, but we are unable to quantify over 2nd-order entities. So, we can have named entities (predicates and relations) which we are unable to quantify over.

One consequence of this is that, in 1st-order logic, we can never prove something non-tautological about a predicate or relation without first making some assumptions about that specific predicate or relation. Statements which share no predicates or relations are logically independent!

This limits the influence of concepts on one another, to some extent.

This sort of thing does not hold for 2nd-order logic, so its translation doesn’t hold for the translation of 2nd-order logic into 1st-order logic, either. (Basically, all statements in this many-sorted logic will use the membership predicate, which causes us to lose the guarantee of logical independence for other named items.)

So we have to be careful: an encoding of one thing into another thing doesn’t give us everything.

Suggestion: if your goal is AI, maybe abandon logic and set theory in favor of type theory. It seems better suited to computers anyway thanks to computational trinitarianism. The problem of pinning down models uniquely seems to be solved by something called the univalence axiom, which I currently don’t understand at all.

The univalence axiom has nothing to do with pinning down models (except perhaps that if you are talking about models internally, it says that you’re pinning them down up to isomorphism rather than up to some notion of syntactic equality). Non-standard models are essential for getting an intuition about what is and isn’t provable; most of the arguments that I’ve seen for unexpectedly interesting terms involve picking a non-standard model, finding a point in that model with unexpected properties (e.g., something that is false in the standard model), and then construct that point. See, for example, this post, which constructs a nonstandard point on the circle, one for which you can’t exhibit a path from it to the basepoint of the circle. More generally, see The Elements of an Inductive Type for a good explanation of why non-standard models are inescapable.

I guess that understanding univalence axiom would be helped by understanding the implicit equality axioms.

Univalence axiom states that two isomorphic types are equal; this means that if type A is isomorphic to type B, and type C contains A, C has to contain B (and a few similar requirements).

Requiring that two types are equal if they are isomorphic means prohibiting anything that we can write to distinguish them (i.e. not to handle them equivalently).

Just so you know… I want to upvote you for linking to that awesome type theory article (although JeremyHahn posted it to this discussion already), but I also want to vote you down for computational trinitarianism (propositions are not types! arggggg!). ;)

I’m generally not on board with the whole “programs are proofs” dogma. When I look into the details, all it says is that a successful type-check of a program gives a proof of the existence of a program of that type. In expressively powerful languages, this seems very uninteresting; so far as I know, you can easily construct a program of any type by just discarding the input and then creating an output of the desired type. (Handling the empty type is a little tricky, but I wonder if infinite loops or program failures would do.) So my understanding is that you only get the interesting type structure (where it isn’t trivial to prove the existence of a type) by restricting the language. (Typed lambda calculus is not Turing-complete.)

So, while I see the curry-howard stuff as beautiful math, I don’t think it justifies the ‘programs as proofs’ mantra that many have taken up! I don’t see the appeal.

I think Harper (who coined the phrase “computational trinitarianism”) doesn’t claim that “programs are proofs”, but rather that proofs are programs. That is, proofs should be written down in typed calculi and (to be intuinistically acceptable) should have computational content. But programming languages and type systems can be used for more things than just proving theorems.

Actually, by discarding the input and then falling into an infinite loop (you cannot construct the output of the desired type without the input).

I don’t know what you mean. Suppose you want a function A → Int, where A is some type. \x:A. 6, the function which takes x (of type A) and outputs 6, seems to do fine. To put it in c-like form, int six(x) {return 6}.

If programs are proofs, then general programming languages correspond to trivialism.

Am I missing something?

The type constructors that you’re thinking of are Arrow and Int. Forall is another type constructor, for constructing generic polymorphic types. Some types such as “Forall A, Forall B, A → B” are uninhabited. You cannot produce an output of type B in a generic way, even if you are given access to an element of type A.

The type corresponding to a proposition like “all computable functions from the reals to the reals are continuous” looks like a function type consuming some representation of “a computable function” and producing some representation of “that function is continuous”. To represent that, you probably need dependent types—this would be a type constructor that takes an element, not a type, as a parameter. Because not all functions are continuous, the codomain representing “that function is continuous” isn’t in general inhabited. So building an element of that codomain is not necessarily trivial—and the process of doing so amounts to proving the original proposition.

What I’m trying to say is that the types that type theorists use look a lot more like propositions than the types that mundane programmers use.

...except in Haskell, where we can make functions for generating instances given a desired type. (Such as Magic Haskeller.) I still feel like it’s accurate to say that general-purpose languages correspond to trivialism.

Thanks, though! I didn’t think of the generic version. So, in general, inhabited types all act like “true” in the isomorphism. (We have A->B wherever B is a specific inhabited type.)

I’m fully aware of that, and I enjoy the correspondence. What bothers me is that people seem to go from a specific isomorphism (which has been extended greatly, granted) to a somewhat ill-defined general principle.

My frustration comes largely from hearing mistakes. One extreme example is thinking that (untyped!) lambda calculus is the universal logic (capturing first-order logic and any higher-order logic we might invent; after all, it’s Turing complete). Lesser cases are usually just ill-defined invocations (ie, guessing that the isomorphism might be relevant in cases where it’s not obvious how it would be).

Magic Haskeller and Augustsson’s Djinn are provers (or to say it another way, comprehensible as provers, or to say it another way, isomorphic to provers). They attempt to prove the proposition, and if they succeed they output the term corresponding (via the Curry-Howard Isomorphism) to the proof.

I believe they cannot output a term t :: a->b because there is no such term, because ‘anything implies anything else’ is false.

The type on the right-hand side is usually something much more complicated and “creating an output of the desired type” is not trivial.

What is “trivialism”?

https://en.wikipedia.org/wiki/Trivialism I guess.

It is not that difficult to write general-purpose functions in total languages like Coq. You just provide the functions with “fuel” to “burn” during computation. You don’t have to express all features of interest in the type. But then you cannot rely on automation.

“Programs are proofs” slogan is interesting from yet another point of view. What are the theorems for which a complex program is the simplest proof? There can be multiple such “theorems” (“principal types” property doesn’t hold in complex settings).

Well yes, either direct input or wider context (i.e. environment, library). You can only construct values for tautology types with expressions not referring to context.

If course this isn’t quite true. Other interesting things which are said include the idea that a proof of a conjunction can be handled as a

pairof proofs (one for each element of the conjunction), and a proof for a disjunction can be a proof for either side, and so on; so in a sufficiently rich type system, we can leverage the type machinery to do logic, using product typesasconjunctions, sum typesasdisjunctions, and so on. A proposition is then modeled as a type for a proof. An implication is a function type, proven by a procedure which transforms a proof of one thing into a proof for another.Still, I can’t swallow “programs are proofs / propositions are types”.

Is this true because of the absent C, true in the sense of the (larger) model, or just false?

ZF can’t prove that models of ZF exist because proving models of ZF exist is equivalent to proving that ZF is consistent, and ZF can’t prove its own consistency (if it is in fact consistent) by the incompleteness theorem. I don’t think ZFC can prove the consistency of ZF either, but I’m not a set theorist.

Also not a set theorist, but I’m pretty sure this is correct. ZF+Con(ZF) proves Con(ZFC) (see http://en.wikipedia.org/wiki/Constructible_universe), so if ZFC could prove Con(ZF) then it would also prove Con(ZFC).

You can substitute ZFC for ZF throughout the quoted paragraph and it will still be true. I don’t know what you want me to get from the link given; it doesn’t seem to contradict the quoted paragraph. The part where the link talks about Vk entailing ZFC for inaccessible k is exactly what the quoted paragraph is saying.

Zah? The linked answer says,

It goes on to say that this need not hold when we interpret “model of ZFC” in the sense of the (outer) model.

So after reading that, I don’t see how it could be true even in the sense described in the article without violating Well Foundation somehow, but what it literally says at the link is that every model of ZFC has an element which

encodesa model of ZFC, notisa model of ZFC, which I suppose must make a difference somehow—in particular it must mean that we don’t get A has an element B has an element C has an element D … although I don’t see yet why you couldn’tconstructthat set using the model’s model’s model and so on. I am confused about this although the poster of the link certainly seems like a legitimate authority.But yes, it’s possible that the original paragraph is just false, and every model of ZFC contains a quoted model of ZFC. Maybe the pair-encoding of quoted models enables there to be an infinite descending sequence of submodels without there being an infinite descending sequence of ranks, the way that the even numbers can encode the numbers which contain the even numbers and so on indefinitely, and the reason why ZFC doesn’t prove ZFC has a model is that some models have nonstandard axioms which the set modeling standard-ZFC doesn’t entail. Anyone else want to weigh in on this before I edit? (PS upvote parent and great-grandparent.)

Here’s why I think you don’t get a violation of the axiom of well-foundation from Joel’s answer, starting from way-back-when-things-made-sense. If you want to skim and intuit the context,

just read the bold parts.1) Humans are born and see rocks and other objects. In their minds, a language forms for talking about objects, existence, and truth. When they say “rocks” in their head, sensory neurons associated with the presence of rocks fire. When they say “rocks exist”, sensory neurons associated with “true” fire.

2) Eventually the humans get really excited and invent a system of rules for making cave drawings like “∃” and “x” and “∈” which they call ZFC, which asserts the existence of infinite sets. In particular, many of the humans interpret the cave drawing “∃” to mean “there exists”. That is, many of the same neurons fire when they read “∃” as when they say “exists” to themselves. Some of the humans are careful not to necessarily believe the ZFC cave drawing, and imagine a guy named ZFC who is saying those things… “ZFC says there exists...”.

3) Some humans find ways to write a string of ZFC cave drawings which, when interpreted—when allowed to make human neurons fire—in the usual way, mean to the humans that

ZFC is consistent. Instead of writing out that string, I’ll just write in place of it.4) Some humans apply the ZFC rules to turn the ZFC axiom-cave-drawings and the cave drawing into a cave drawing that looks like this:

“∃ a set X and a relation e such that <(X,e) is a model of ZFC>”where <(X,e) is a model of ZFC> is a string of ZFC cave drawings that means to the humans that (X,e) is a model of ZFC. That is, for each axiom A of ZFC, they produce another ZFC cave drawing A’ where “∃y” is always replaced by “∃y∈X”, and “∈” is always replaced by “e”, and then derive that cave drawing from the cave drawing ” and ” according to the ZFC rules.

Some cautious humans try not to believe that X really exists… only that ZFC and the consistency of ZFC imply that X exists. In fact if X did exist and ZFC meant what it usually does, then X would be infinite.

4) The humans derive another cave drawing from ZFC+:

“∃Y∈X and f∈X such that <(Y,f) is a model of ZFC>”,6) The humans derive

yet anothercave drawing,“∃ZeY and geX such that <(Z,g) is a model of ZFC>”.Some of the humans, like me, think for a moment that Z

∈Y∈X, and that if ZFC can prove this pattern continues then ZFC will assert the existence of an infinite regress of sets violating the axiom of well-foundation… but actually, we only have “ZeY∈X” … ZFC only says that Z is related to Y by the extra-artificial e-relation that ZFC said existed on X.I think that’s why you don’t get a contradiction of well-foundation.

Well Foundation in V_alpha case seems quite simple: you build externally-countable chain of subsets which simply cannot be represented as a set inside the first model of ZFC. So the external WF is not broken because the element-relation inside the models is different, and the inner WF is fine because the chain of inner models of external ZFC is not an inner set.

In the standard case your even-numbers explanation nicely shows what goes on — quoting is involved.

I need to think a bit to say what woud halt our attempts to build a chain of transitive countable models...

Well, technically not every model of ZFC has a ZFC-modelling element. There is a model of “ZFC+¬Con(ZFC)”, and no element of this monster can be a model of ZFC. Not even with nonstandard element-relation.

Linked impressive authority says the model has a ZFC-model-encoding element,

plusenough nonstandard quotedZFC axiomsin-model that in-model ZFC doesn’tthinkit’s a ZFC-model-encoding element. I.e., the formula for “semantically entails ZFC” is false within the model, but from outside, using our own standard list of axioms, we think the element is a model of ZFC.Ah, sorry, I didn’t notice that the question is about model of ZFC inside a “universe” modelling ZFC+Con^∞(ZFC)

The linked answer is talking about models of ZFC in an external sense and Eliezer is talking about them in an internal sense.

I’ll make this more explicit, and note that at present I don’t see it affecting anyone’s argument:

ZF can’t prove its own self-consistency and thus can’t prove it has a model. You logician says that’s as it should be, since not every model of ZF contains another model—which seems true provided we treat each model more as an interpretation than as an object in a larger universe—and then says that a model has to be “well-populated”—which AFAICT holds for standard models, but not for all other interpretations like those your logician used in the previous sentence.

I’m still worried about the word “model”. You talk about models of second-order logic, but what

isa model of second-order logic? Classically speaking, it’s a set, and you do talk about ZF proving the existence of models of SOL. But if we need to use set theory to reason about the semantic properties of SOL, then are we not then working within a first-order set theory? And hence we’re vulnerable to unexpected “models” of that set theory affecting the theorems we prove about SOL within it.It seems like you’re treating “model” as if it were a fundamental concept, when in fact the way it’s used in mathematics is normally embedded within some set theory. But this then means you can’t robustly talk about “models” all the way down: at some point your notion of model bottoms out. I don’t think I have a solution to this, but it feels like it’s a problem worth addressing.

My response to this situation is to say that proof theory is more fundamental and interesting than model theory, and pragmatic questions (which the dialog attempted to ask) are more important than model-theoretic questions. However, to some extent, the

problemis to reduce model-theoretic talk to more pragmatic talk. So it isn’t surprising to see model-theoretic talk in the post (although I did feel that the discussion was wandering from the point when it got too much into models).What I got out of this post is that second-order logic is a lie.

And now I understand (and agree with) Johnicholas’ comment from the last post. ZFC

isa hack, but it is a very good hack which fits with our intuitions. However, after reading this post, I think that to use second-order logic is to decieve yourself into thinking that ZFC is the universally, unequivocallybestdefinition of a set. I’m not anywhere close to 100% confident (though I’m still well over 50%) that ZFC is even consistent!That can easily be worked around. One answer (the one I’m intimately familiar with, not necessarily the best one) is to use first-order metalogic, which proves theorem schema rather than theorems (note that most theorem schema end up isomorphic to theorems, as well). This is the approach Metamath takes, and they have created a metalogically complete axiomatization of first-order metalogic with equality, which ends up allowing ZFC to be finitely axiomatized.

Worse. You are being tricked into believing that ZFC is at all a definition of a set at all, while it is just a set of restrictions on what we would tolerate.

In some sense, if you believe that there is only one second-order model of natural numbers, you have to make decisions what are the properties of natural numbers that you can range over; as Cohen has taught us, this involves making a lot of set-theoretical decisions with continuum hypothesis being only one of them.

What I gather is that the real question here isn’t so much “First or second order logic?”, but “Is it sensible to try to convey/instill the idea of ‘largest powerset’ to/in an AI, and if so how?” “Largest powerset” seems both meaningful and simple (for the purpose of applying Occam’s razor), but we don’t know how to capture its meaning using syntactic proof rules (or rather, we know that we can’t). Maybe we can capture the meaning some other way, for example in the form of a “math intuition module” that can arrive at “probabilistic” conclusions about mathematical structures that are defined in part by “largest powerset”?

We know we can’t do that either, although we can do slightly better than we can with monotonic logic (see my response).

I guess the result you’re referring to is that there is no TM whose output converges to 1 if its input is a true statement (according to standard semantics) in second order PA and 0 if it’s false. But don’t we need stronger results to conclude “we can’t do that either”? For example, take a human mathematician and consider his or her beliefs about “the standard numbers” as time goes to infinity (given unlimited computing power). Can you show that those beliefs can’t come closer to describing the standard numbers than some class of other mathematical structures?

Yes, you’re right. I was being too severe. We can’t “capture the meaning” probabilistically any better than we can with classical truth values

ifwe assume that “capture the meaning” is best unpacked as “compute the correct value” or “converge to the correct value”. (We can converge to the right value in more cases than we can simply arrive at the right value, of course. Moving from crisp logic to probabilities doesn’t actually play a role in this improvement, although it seems better for other reasons to define the convergence with shifting probabilities rather than suddenly-switching nonmonotonic logic.)The main question is: what other notion of “capture the meaning” is relevant here? What other properties are important, once accuracy is accounted for as best is possible? Or should we just settle for as much accuracy as we can get, and then forget about the rest of what “capture the meaning” seems to entail?

Supposing that physics is computable, this isn’t necessarily true. For example, if physics is computable in type-theory (or any language with non-standard models), then the non-standard-ness propagates not just to your physics, but to the individuals living in that physics which are trying to model it. Consider the non-standard model of the calculus of inductive constructions (approximately, Martin-Löf type theory) in which types are pointed sets, and functions are functions between sets which preserve the distinguished point. Then any representation of a person in space-time you can construct in this model (of the computation running physics) will have a distinguished point. Since any function to or from such a person-located-in-space-time must preserve this point, I’m pretty sure no information can propagate from the the person at the non-standard time to the person at any standard time. So even if you expect to observe the turing machine to halt whenever you find yourself in a non-standard model, you should also expect to not be able to tell that it’s non-standard

except when you’re actively noticing its non-standardness. Furthermore, since functions must preserve the point, you should expect (provably) that for all times T, the turing machine should not be halted at time T (because the function mapping times T to statements must map the non-standard point of T to the non-standard (and trivial) statement).For a more eloquent explanation of how non-standard models are unavoidable, and often useful, see The Elements of an Inductive Type.

My reply got a bit long:

http://lesswrong.com/r/discussion/lw/g81/re_secondorder_logic_the_controversy/

Eliezer—why is it that you seem to adamant that our physical reality is better modeled by the standard natural numbers rather than some supernatural model? If we try to see whether a physical implementation of a Turing machine provides an inconsistency in PA, then of course it won’t because it will run for fewer than Ackermann(20) iterations, and all number theories say that it won’t halt by then. If we instead lived in something that looked like the game of life and we did expect it to last infinitely long why exactly would we expect time steps to be indexed by some particular model of the natural numbers when we haven’t yet seen more than finitely many of them (and those finitely many are predicted in all models). Claiming that we don’t expect the Turing machine to ever halt because it hasn’t yet seems a little like saying that all times I’ve seen come before 2014, and therefore I expect that all times ever will have this property.

Simplicity is an important consideration. There are many more ways for complex universes to have violated our expectations already, while still making sapience possible, than for simple universes to have done so. That is, the space of simple universes is much more constrained by anthropic principles than is the space of complicated ones; so it is surprising that we seem to have found ourselves in a universe largely describable with very simple and uniform logic and mathematics, since there are many more ways to be complicated than to be simple. And this surprising fact may suggest that exotic logical, mathematical, or metaphysical structures do not exist, or are in some other respect not candidates for our anthropic reasoning. In other words, we can draw a tentative inductive conclusion regarding whether spacetime is nonstandard overall by considering all the ways in which our theories

to datehave been surprisingly successful and general while appealing almost exclusively to thesimplest(which often corresponds to ‘most standard,’ with a few important caveats) conceivable systems. This won’t be conclusive, but it’s at least suggestive.I guess here’s my point. We would not expect to have seen any empirical differences between standard and non-standard models of spacetime yet. There would not be any observable differences until we reach supernatural times. I suppose you might discount the nonstandard models based on their not being computable or something, but I’m not convinced that this is the right thing to do. I also agree that the standard model should be given more weight (due to its conceptual simplicity to pin down) than any particular nonstandard model, but I don’t see why it should be given more weight than all of them put together. The even more concise (or at least less potentially problematic) theory would be to say that we are living in

somemodel of these axioms and not bother to specify which.I’m not sure that it does deserve more weight than all of them put together. That largely rests on

howcomplicated our needlessly complicated models have to be in order to recover the empirical world. If there are a lot ofrelativelysimple ways to perfectly reproduce the empirical predictions of our standard models to date, then that will make a bigger difference than if there are an even greater number of fantastically gerrymandered, convoluted systems.Even though there are more complicated universes that perfectly generate our observations to date than there are simple universes that do so, given proper

subsetsof our universe, assuming a structurally simple universe tends to yield more accurate predictions than does assuming a structurally complex universe.Although in absolute terms there are so many more ways to be complicated than to be simple that given almost any data, most models predicting that data are exceedingly complicated and gerrymandered...

… and although it is not clear whether sentience exists in a larger proportion of complicated worlds than of simple ones (or vice versa)...

… nevertheless it seems clear that a larger proportion of the simple than of complex worlds that predict

someof our experience predictallof it. That is, there are proportionally many more ways to complicatedly predict most of our observations and then spectacularly fail to predict the last few, than there are ways to complicatedly predict most of our observations and then predict the last few as well.This, however, is still speaking very generally; the

degreeof complexity will make a huge difference here, and not all non-standard models areradicallymore complicated.But when we are talking about which model of set theory to use to model our experiences, all of the non-standard models give exactly the same predictions to date (and will continue to do so until we reach non-standard times). And not all of these models are horribly complicated. There’s the model of constructable set theory, where you only have the sets absolutely guaranteed you by the axioms. This gives you a countable model of set theory (and hence not the standard one), but it is still easily describable.

How do we know they haven’t?

Interesting choice of example.

Not objecting to it, just interesting.

Actually, here’s maybe a better way of saying what I’m trying to get at here: What evidence would actually convince you that we lived in a universe given by a non-standard model of the natural numbers?

Before you say “I run a computer program which returns a proof of 0=1 in PA”, think about it for a while. Ignoring the fact that in practice, you would probably suspect that the proof is wrong, might you instead take this as evidence that PA is actually inconsistent. After all, Godel tells us that we can’t really be sure about that. In fact, if you did live in a nonstandard model of PA, and found a nonstandard proof of 0=1, wouldn’t it feel from the inside like you found an inconsistency in PA rather than that you found a nonstandard number?

Was there supposed to be a word after “structured” and before the comma?

For what it’s worth, mathematicians care about first-order logic because it satisfies nicer theorems than second-order logic, in the same way that mathematicians care about Lebesgue integration because it satisfies nicer theorems than Henstock-Kurzweil integration. The statements that mathematicians talk about in many contexts are first-order anyway, so we might as well figure out what there is to say about first-order statements so we can apply it to the relevant situation (in the same way that the functions that mathematicians want to integrate in many contexts are Lebesgue integrable anyway, so...).

In mathematics there is usually a tradeoff between generality and power: the more general a theory is, the less it can say. That’s why abstract algebra textbooks generally talk about groups instead of monoids even though the latter are more general: there are lots of nice theorems about, say, finite groups that just fail miserably for finite monoids, and so it’s much harder to say nontrivial things about finite monoids than it is to say nontrivial things about finite groups.

And first-order logic is a surprisingly powerful tool in pure mathematics: in the guise of model theory it has lots of interesting applications, e.g. Hrushovski’s proof of the Mordell conjecture over function fields.

I do not think the situation is as simple as you claim it to be. Consider that a category is more general than a monoid, but there are many interesting theorems about categories.

As far as foundations for mathematical logic go, any one interested in such things should be made aware of the recent invention of univalent type theory. This can be seen as a logic which is inherently higher-order, but it also has many other desirable properties. See for instance this recent blog post: http://golem.ph.utexas.edu/category/2013/01/from_set_theory_to_type_theory.html#more

That univalent type theory is only a few years old is a sign we are not close to fully understanding what foundational logic is most convenient. For example, one might hope for a fully directed homotopy type theory, which I don’t doubt will appear a few years down the line.

Sure. It has of course been the case that careful increases in generality have also led to increases in power (hence the weasel word “usually”). There is something like a “production-possibilities frontier” relating generality and power, and some concepts are near the frontier (and so one can’t generalize them without losing some power) while some are not.

Actually, in NBG you have explicitness of assumptions and of first-order logic — and at the same time axiom of induction is a single axiom.

Actually, if you care about cardinality, you need a well-specified set theory more than just axioms of reals. Second-order theory has a unique model, yes, but it has the notion of “all” subsets, so it just smuggles some set theory without specifying it. As I understand, this was the motivation for Henkin semantics.

And if you look for a set theory (explicit or implicit) for reals as used in physics, I am not even sur you want ZFC. For example, Solovay has shown that you can use a set theory where all sets of reals are measurable without much risk of contradictions. After all, unlimited axiom of choice is not that natural for physical intuition.

Too many contradictions, I think: it should read “deny that shouting about the standard numbers

wouldmean anything”.(Removed wrong edit here)

Note: this is actually connected to an ambiguity in “ZF believes in all the axioms of ZF”. That system has infinitely many formal axioms, and lacks any finite way to assert them all at once (eg, to use them all in a particular proof). If it could, then it could use a principle appropriately called “reflection” to show that they must all be true inside a particular set—a natural model like the ones Eliezer talks about in the OP, though not provably as large as those—and could thus prove itself consistent.

This discussion seems to equate the question of what language we want an AGI to most often

reason with, and what language we think our universe, or (Tegmark?-)reality in general, is likely ‘written’ in. But there may be practical virtues of one language over the other that do not translate intodistinct reasonsto believe that reality is written in the more practical language; so the two questions should be kept separate to some extent. ‘Think like reality’ is aheuristicfor helping alleviate dissonance between our beliefs, and better aligning our beliefs with our goals; but it admits of exceptions, both epistemically and instrumentally.One simple way to answer the AGI-specific question is to have the AGI ‘think’ in whichever language seems more useful (keeping in mind that having true beliefs is usually, though not always, very useful!), but try as much as possible to keep the AGI from assuming that the axioms and inference rules with which it (initially?) thinks

mustbe the same as the ones that best characterize ultimate reality. Instead, whichever language feels more ‘natural’ to the AGI, we want it to be able to do the same sort of inner-dialogue reasoning that Eliezer himself is doing in this series of vignettes, weighing the empirical evidence for and against the universe’s ‘objective second-orderedness’ and reaching a conclusion proportioned to the evidence. The AGI should, as much as is feasible and practical, be reasonably skeptical of its own basic mode of operation,even ifwe limit its capacity to self-modify (either directly, or via its priors) in such a way that itcan’t helpbut think in terms of certain axioms the evidence reveals to be false.A limiting case: Supposing the AGI discovers Sylvan’s Box, we want it to update in favor of dialetheism, even if it was initially most practical to code it to think in terms of classical logic. At least, that’s the ideal; it may be that the costs of actually programming an AGI to be able to take seriously such a vast theory-space, given just how unlikely dialetheist scenarios are, outweighs the virtue of permitting the AGI to update in favor of irreconcilable contradictions. But that’s a pragmatic consideration; we should still hold

in principleto such Tarski litany-instances as:If any proposition is both true and false, I desire to believe that some proposition is both true and false.

If disjunctive syllogism is not always valid, I desire to believe that disjunctive syllogism is not always valid.

If some propositions are neither true nor false, I desire to believe that some propositions are neither true nor false.

Similarly, a strictly first-order logician should grant: ‘If the universe is structured second-orderedly, I desire to believe that the universe is structured second-orderedly.’

Even the alien race who cannot

thinkin second-order terms should at least grant: ‘If I cannot conceive of how the universe is structured, I desire to believe that I cannot conceive of how the universe is structured.’The flip side of this: If we found it more useful to make the AGI reason paraconsistently (i.e., to notice contradictions in its own beliefs and non-explosively reason with them in the hopes of rooting out what went wrong, like humans do), this on its own would in no way give us reason to program the AGI to believe that the physical (or platonic) universe harbors objective contradictions of any sort, or to be easily (or even possibly) persuaded of this proposition by future evidence.

We should get this automatically in

manycases. Classical logic can think about dialetheism and other possibilities by inventing negative predicates for every predicate, say. The rules relating positive to negative can then be chosen at convenience. (In first-order logic, we would need to invent a negative predicate separately for each positive; in second-order logic, we could make a general statement to the effect that each positive predicate has a negative, and make statements about how they combine.)That’s one reason we don’t need to worry too much about which logic is correct; as johnicolas keeps pointing out, the existence of encodings means (at least in many cases) we get the same power regardless of what choice we make.

In additionto conventional negation, or in lieu of it? Remember that, on dialetheism, we may not know in advance which falsehoods are going to end up later turning out to also be true. So we’dneverbe able to assert real negations (as opposed to ‘negativized’ predicates) if we want to completely protect ourselves from ever going paraconsistent/trivialist (depending on where our reasoning system puts us when we encounter a contradiction).This issue isn’t just one of computational power; it’s of choosing priors for our AGI’s basic interpetations of reality. I distinguished the two questions “what language we want an AGI to most often reason with” v. “what language we want the AGI to think our world is written in;” if we think that the AGI’s functionality will best be met if it has a somewhat false view of the world in some respect (which is dangerous but might have non-insane justifications), then the latter question will be distinct from “what language the world is actually written in.” But I can agree with you that it’s mostly just an engineering problem to answer the first question (i.e., we don’t need to settle every metaphysical debate before we can start coding!), without thinking this trivializes the second question.

What I’m saying is

it doesn’t matter, and therefore doesn’t obviously make sense to ask, what language the world is written in.Even if the worldactually runs onparaconsistent logic, we can still do fine with classical logic. It will alter our prior somewhat, but not a while lot, because the encoding of one into the other isn’t so hard.Eliezer is attempting to make some comparison between the structure of the actual world and the structure of the logic, to determine which logic most seems to be what the world is written in. But the existence of encodings of one logic in another makes this exercise somewhat unnecessary (and also means that the structure of the world is only weak evidence for “what logic it was written in”).

To answer your question:

in addition, because having strong negation in addition to weak negation only increases the expressive power of the system. It does not increase the risk you mention, because the system is still choosing what to believe according to its learning algorithm, and so will not choose to believe strong negatives that cause problems.EDIT:

In response to srn347, I would like to clarify that I do not intend to belittle the importance of comparing the effectiveness of different logics. The existence of an encoding will only get you specific things (depending on the nature of the encoding), and furthermore, encodings to not always exist. So, specifically, I’m only saying that I don’t see a need for paraconsistent logic. Other possibilities need to be judged on their merits. However, looking for encodings is an important tool in that judgement.

‘Doesn’t matter’ in what sense? ‘Doesn’t make sense’ in what sense?

Be careful not to confuse the

literalquestion ‘Is our world actually a simulation programmed in part using second-order logic?’ from the more generic ‘is our world structured in a second-order-logic-ish way?’. The latter, I believe, is what Eliezer was concerned with—he wasn’t assuming we were in a simulation, he was asking whether our universe’s laws or regularities are property-generalizing in a second-order-style way, or schematic in a first-order-style way, the importance being that schematic laws are more likely to vary across spacetime. Similarly, the important question of whether there are dialetheias should not be confused with the more occult question of whether our universe is a simulation written using paraconsistent logics (which, of course, need not involve any true contradictions).This is confusing a number of different issues. Paraconsistent logic need not be dialetheist, and not all dialetheists favor paraconsistent logic (though, for obvious reasons, nearly all of them do). You’re reconfusing the issues I disentangled in my earlier posts, while (if I’m not misunderstanding you) trying to make a scrambled version of my earlier point that our choice of logic to have the AGI reason with is partly independent of our choice of logic to have the AGI think the universe structurally resembles.

If you think it doesn’t make sense to ask what logics our universe structurally resembles, remember that the structural similarity between arithmetic operations and physical processes is what makes mathematics useful in the first place, and that we should not be surprised to see similarities between the

simplestsystems we can come up with, and a universe with remarkably simple and uniform behavior, at the fundamental level.That’s assuming it never makes a mistake early on. If it does, and it’s not a paraconsistent reasoner, its epistemology will explode. You underestimate the virtues of paraconsistent logics for complex reasoning systems; they’re very valuable for epistemic self-debugging, as opposed to shutting the whole thing down if even the smallest mistake occurs. And, again, none of this requires the slightest commitment to dialetheias being possible.

I’m imagining a better-designed system than that. Incoming facts together with internal reasoning would constantly be invalidating various theories; the system should have a bunch of commitments which explode on it, but rather, should always be seeking out contradictions in order to invalidate theories.

Perhaps so. I don’t actually understand what it would mean to reason with a paraconsistent logic while still believing a classical logic. Is something like that possible? Or (as it seems) would reasoning via a paraconsistent logic have to reduce the number of tautologies I recognize? I’ve sort of ignored paraconsistent thought merely because I’m not interested in dialetheism...

Yes, re-reading your original post, I can see that I missed that point (and was partially re-stating it). The actual point I’m trying to make is basically to respond to this part:

by saying that we get much of that automatically. Really, what I should have said is that we can achieve this by choosing as expressive a logic as possible, to ensure that as many other logics as possible have good embeddings in that logic.

This could alternatively be stated as an expression of confusion about why you would want to address this concern separately from the choice of logic, since if we attempted to implement the aforementioned inner-dialog reasoning, we would surely have to provide a logic for the reasoning to take place in.

Not only is it possible, but probably over 99% of people who employ paraconsistent logics believe in classical-logic metaphysics, or at least something a lot closer to classical-logic metaphysics than to dialetheism. Paraconsistent logic is just reasoning in such a way that when you arrive at a contradiction in your belief system, you try to diagnose and discharge (or otherwise quarantine) the contradiction, rather than just concluding that anything follows. Dialetheism is one way (or family of ways) to quarantine the contradiction so that it doesn’t yield explosion, but it isn’t the standard one. Paraconsistent reasoning is a proof

methodology, not a metaphysical stance in its own right.Within reason. We may not want the AGI to be so expressive that it can express things we think are categorically meaningless and/or useless. And the power to express some meaningful circumstances might come with costs that outweigh the expected utility. I suppose one way to go about this is to pick the optimal level of expressivity given the circumstances we expect the AGI to run into, but try to make the AGI able to self-modify (or generate nonclassical subsystems with which it can carry on a reasonable, open-minded dialogue) to increase expressivity if it runs into situations that seem

especiallyanomalous given its conception of what a circumstance or factis.The basic problem is: How does one assign a probability to there being true propositions that are intrinsically ineffable (for non-complexity-related reasons)? A good starting place is to imagine a being that had far less logical expressivity than we do (e.g., someone who could say ‘p’ / ‘true’ or (as a primitive concept) ‘unknown’ / ‘unproven’ but could not say ‘not-p’ / ‘false’), and reason by analogy from this base case.

Ideally, if two subsystems of an AGI are designed to reason with different logics, and we want the two to argue over the best interpretation of a problem case, we would settle the dispute by some rule like ‘Try to prove to the satisfaction

of your opponentthat their view leads to too many circumstances we both agree are problems. Avoid question-begging arguments, i.e., ones that assume that your logic is the right one, when that is precisely what is under dispute; seek arguments that both of you can agree are valid, or even arguments that you think are invalid but that you think problematize your opponent’s position.’ Of course, if we need a decision procedure in cases where the AGI arrives at a stalemate, we may need to assume that a certain side ‘wins’ if there’s a draw.I think you are

overestimatingthe difficulty of the mathematical problem here! To quote JoshuaFox:But once those problems are solved, we do

notneed to additionally solve the problem you are highlighting, I think...When it comes down to it, wouldn’t this be just like some logic that is the common subset of the two, or perhaps some kind of average (between the probability distributions on observations induced by each logic)? Again, I think this will be handled well enough (handled

better, to be precise) by a more powerful logic which can express each of the two narrower logics as a hypothesis about the structure in which the environment is best defined. Then the honest argument you describe will be a result of the honest attempt of the agent to estimate probabilities and find plans of action which yield utility regardless of the status of the unknowns.If I may interject (assuming it isn’t too early to start proposing solutions), it does turn out to be the case that computability logic is a superset of linear logic, which encodes resource-boundedness and avoids material entailment paradoxes, intuitionistic logic, which encodes proof/justification, and classical logic, which encodes truth. To accept less is to sacrifice one or more of the above attributes in terms of expressiveness.

Thanks; I certainly didn’t intend to say that all logics are equivalent so it doesn’t matter… edited to clarify...

True, but these kinds of tricks might make hash of its utility function.

Very true! But that is a problem that

alreadyneeds to be solved, separately. :)Yes, and friendliness is already a hard enough problem, without people dumping the hard parts of other problems into it.

I don’t think you can avoid solving this one in the first place, so I wouldn’t call it dumping. (The problem is already inherent in the friendliness problem, so we’re not adding anything new to the friendliness problem by keeping this out of the logic problem.) The fact is that goals which have to do with external objects need to make use of “fuzzy” descriptions of those objects. There should be some reasonable way of formalizing fuzziness. (Of course standard fuzzy logic theory is woefully inadequate, but it has roughly the right goal, so we can re-use the name.) Utility needs to be a function of some “empirical clusters” in reality, specified independently from the underlying physics (or complete list of border cases) of those entities. Encoding a different logic into our logic is not much different from encoding a physics, so I think there is not a separate problem to be solved.

hey that was really interesting about whether or not “all properties” in second order logic falls victim to the same weirdness as the idea of a natural number does in first order logic. I never thought of that before.

oh yeah do you have a reference for “there’s tricks for making second-order logic encode any proposition in third-order logic and so on.”? I would like to study that too, even though you’ve already given me a lot of things to read about already!

http://plato.stanford.edu/entries/logic-higher-order/

doesnt answer my question

It does.

Section 4 in the article linked in the grandparent discusses higher-order logics and how they can all be simulated by and reduced to second-order logic.

https://en.wikipedia.org/wiki/Nonfirstorderizability

I fail to see how this is evidence of Non

secondorderizability of some possible sentences.There is no known trick to encode

allsentences expressible in higher-order logics into first-order logic, but thereissuch a trick to encode all sentences expressible in higher-order logics insecond-order logic.The trick in question is described in the SEP article. Doesn’t that suffice as a reference and starting point for studying the notion that second-order logic can encode higher-order logics?

i misread it XD trhanks for your help

When I learned about Lowenhein-Skolem, I just thought “lol first-order logic sux”. I didn’t question whether distinguishing among infinite cardinalities was a meaningful thing for humans to do. Eliezer is clearly a higher-order student of logic.

Er, to be clear about not taking credit, this is a

long-running philosophical debate in mathematics. The parts about “Well, what physics, then?” I haven’t read elsewhere but that could just be limited reading.And what about Skolem’s Paradox? Does it make you think “lol set theory sux”? :)

As far as complexity-of-logic-theories-for-reason-of-believing-in-them, that should be proportional to the minimal Turing machine that would check if something is an axiom or not. (Of course, in the case of a finite list, approximating it to the total length of the axioms is reasonable, because the Turing machine that does “check if input is equal to following set:” followed by set adds a constant size to the set—but that approximation breaks down badly for infinite axiom schema).

That might sound weird, but do we have any evidence that our time follows the standard numbers (or a continuous version of them) only? Is it even possible to get such evidence? Maybe our turing machine (looking for contradictions in PA) stops at −2* - “we” cannot see it, as “we” are on the standard numbers only, experiencing only effects of previous standard numbers.

Time is a property of the map, not a property of the territory. The territory probably doesn’t have anything that could reasonably be called time, along the lines of the timeless physics posts.

In this approach, anything is a property of the map only.

(One thing is to say that time isn’t, from the point of view of the fundamental microscopic laws, as universal and important as might intuitively appear. Another thing is to say that it doesn’t exist in the territory. There are natural clocks, such as decaying radioctive elements, whose existence has little common with human “maps”.)

Barbour’s universe/multiverse is much

morecomplex than a classical timeful universe. And it isn’t especialy likely, either. Specualtion =/= proof. To say the least.Could you expand on that? I had the impression that timeless physics was larger in RAM, but not on disk—where it counts for MML, Solomonoff Induction, etc.

I’m afraid I don’t understand your RAM/disk metaphor.

I took it to mean that it requires you to keep track of more state at any given time, but that the whole system has less logical complexity when you write it all down. Sort of like the difference between estimating the future state of a restricted N-body problem with numerical methods vs. solving it directly.

There are no “given times” in Barbour’s universe. And who’s the “you”? An internal or external observer?

“Doesn’t physics say this universe is going to run out of negentropy before you can do an infinite amount of computation?” Actually, there is a [proposal that could create a computer that runs forever.

Are these time crystals made up of protons? If so, proton decay might eventually cause it to disintegrate.

“If there’s a collection of third-order axioms that characterizes a model, there’s a collection of second-order axioms that characterizes the same model. Once you make the jump to second-order logic, you’re done—so far as anyone knows (so far as I know) there’s nothing more powerful than second-order logic in terms of which models it can characterize.”

You clearly can state Continuum Hypothesis in the higher order logic, while a 2nd order formulation seems elusive. Are you sure about it?

Eliezer is correct. See SEP on HOL.

Link is good, but I guess direct explanation of this simple thing could be useful.

It is not hard to build explicit map between R and R² (more or less interleaving the binary notations for numbers).

So the claim of Continuum Hypothesis is:

For every property of real numbers P there exists such a property of pairs of real numbers, Q such that:

1) ∀x (P(x) → ∃! y Q(x,y))

2) ∀x (¬P(x) → ∀y¬Q(x,y))

(i.e. Q describes mapping from support of P to R)

3) ∀x1,x2,y: ((Q(x1,y)^Q(x2,y)) → x1=x2)

(i.e. the map is an injection)

4) (∀y ∃x Q(x,y)) ∨ (∀x∀y (Q(x,y)-> y∈N))

(i.e. map is either surjection to R or injection to a subset of N)

These conditions say that every subset of R is either the size of R or no bigger than N.

Second order logic may be able to express all expressible statements in 3rd order logic, 4th order, up to any finite Nth order, but perhaps there exist ∞th order statements that 2nd order logic cannot express. Albeit, such statements could never be fully decidable and would thus be, at best, semidecidable or co-semidecidable. This may not be complete, but science works the same way.

Nope, 2nd-order logic can discuss Nth order where N is an infinite ordinal, as well [for some class of describable ordinals]. Maybe the argument could be made that 2nd-order logic cannot discuss a universe containing

allthe ordinals, and in that case maybe we could argue that set theory can, and is therefore more powerful. But this is not clear.Independently of that, we might also believe that category theory can describe things beyond set theory, because category theory regularly investigates categories which correspond to “large” sets (such as the set of all sets). There are ways to put category theory into set theory, but these translate the “large” sets into “small” sets such as Grothendeik universes, so we could still argue that the semantics of category theory is bigger.

However, even if this is the case, it might be that 2nd-order logic can encode category theory in the same way that it can encode 3rd-order logic. We can add axioms to 2nd-order logic which describe non-standard set theories containing “big” sets, such as NF. This may allow for an “honest” account of category theory, in which categories really can be defined on “big” sets such as the set of all sets.

I don’t think that infinitary logic is the same as ω-order logic.

Wouldn’t ω-order logic be a subset of infinitary logic? Or do I have it backwards?

I don’t think they have anything to do with each other. Infinitary logic is first-order logic with infinite proof lengths. Second-order logic is finite proof lengths with quantification over predicates. I don’t know if there’s any particular known relation between what these two theories can express.

Third order logic talks about something that we don’t understand, and we don’t understand it in exactly the same way that the first-order aliens can’t understand second order logic.

One of the things third order logic might be able to do is determine the number of steps a Turing will take before halting in a constant-time algorithm.

Actually, the halting problem has been proven impossible, for if third order logic can be used to determine if/when a turing machine halts, a turing machine could run that on itself, but would take necessarily more than N steps to determine that it halts (where N is the number of steps it halts in), which contradicts. Third-order logic may be able to do the inconceivable, but not the noncomputable.

That’s not the standard proof, at least not the one I know: You haven’t proven that it would necessarily take more than N steps. The way it goes is that you assume there is a machine that halts iff the machine you give it runs forever given itself as input; now if you run it on itself, if it halts, it is wrong, and if it doesn’t halt, it is also wrong, meaning our assumption was wrong.

Turing machines can’t use third order logic. That might be a limitation of the programmer, or it might be a limitation of deterministic machines. A Turing machine can’t ever compute anything about it’s own output; third order logic might allow deterministic machines which can compute the results of their own output.

I suspect that there is a halting problem where the halting status of machines which only use second order logic cannot be computed in third order logic, just like limits cannot be computed in Peano arithmetic.