XelaP comments on Diagonalization Fixed Point Exercises

XelaP 11 Apr 2026 11:32 UTC
2 points
0
1: Got spoiled on this, so just “recalling”
Glossing over bijecting 2^N with R (use binary expansions, and then handle the countable ‘collisions’ like .111… = 1.000… separately), let’s prove S can’t surject into P(S). We have a function f: S → P(S), and if we look at a value f(s) we’ll get a subset of S. Let’s build a set U that, for every s, asks if s in f(s) and then does the opposite. That is, U = {s : s not in f(s)}. Now, if we have a surjection then there’s some u s.t. f(u) = U. Is u in U? Well, if it is, then it isn’t, but if it isn’t, then it is.
This made me notice that it’s like Russell’s paradox! So I guess I am glad I ‘recalled’ it.
If you haven’t seen that one before: Imagine that you could make a set of all sets. Prove a contradiction. (whenever you have a set, e.g. the integers, you can do things like make {x^2 : x in the integers, x is even} which means “the set of all values of x^2, where x ranges over integers that satisfy (x is even)” aka the even squares).
if we have a set U of all sets, then we’d have an f: U → P(U) because… P(U) = U! So just by saying f = identity, we get a contradiction from Cantor! More explicitly, walking through the proof again, we make S = {x : x in U, x not in x}. Now, is s in s? (aka is s in f(s)?) Well, if it is, then it isn’t, but if it isn’t, then it is. Womp womp.
2&3:
Well, in the previous exercise, our ‘unfixy’ map (call it f: T → T) is the one that sends 0 to 1 and 1 to 0. So now we can mimic the previous proof: let g: S → (S → T). For every s, we look at g(s), and then look at g(s)(s). We then apply the ‘shift’ map f(g(s)(s)) to get an element of T depending on s, so we have a function h(s) = f(g(s)(s)). If g is surjective then there’s some s’ (I really want to write s^\*, but prime is quicker) such that g(s’) = h. Now, what’s g(s’)(s’)? Well, it’s h(s’) = f(g(s’)(s’)). But then we have a fixed point!
In general, any s’ that gets sent to our function h provides us our fixed point g(s’)(s’)
Bonus Problem: Construct a fixed point combinator in the lambda calculus. That is, define a pure function with no free variables called fix such that fix f = f (fix f)
Well, in the previous problem our fixed point was g(s’)(s’), where s’ is such that g(s’) = h, and h(s) = f(g(s)(s)).

Now we’ll motivated what we’ll do next by doing some intuitive but not really correct things, as that’s how I got to the actually correct thing.
So our fix should be something kinda like (using pseudo python notation so I don’t have to break out the latex)
lambda f: g(s’)(s’)

I can’t quite remember how I actually thought up the next step, but I was thinking something like “Well, s’ is like, the code that gets sent to the function h”. So intuitively g(s’)(s’) should be something like (lambda s: f(g(s)(s))) (code for previous)

The thing to realize now is that the “code” for a lambda calculus expression… is just that expression. In other words, why don’t we just pick g to be the identity (lambda x: x)? That is, if we say that L is the set of lambda functions and L(A->B) is the set of lambda functions from A to B, then L(L → T) is a subset of L because L and T are both subsets of L. Looks kinda weird, but I guess that’s just what happens when you don’t have types?

It’s surjective because any function L → T can be passed in to the identity to juts get itself.

So we now have h(s) = f(s(s)), s’ = h, and we want s’(s’).

Writing it out as a function (and using the usual convention that concatenation is application),
fix = lambda f: (lambda s: f(s s)) (lambda s: f(s s))

We can check it: fix f = (lambda s: f(s s)) (lambda s: f(s s)) = f( (lambda s: f(s s)) (lambda s: f(s s)) ) = f(fix f)

If you don’t know, this is the Y combinator. And now I finally know where the formula comes from!
Okay, now use that to implement the factorial function recursively. I already know the answer to this unfortunately, but if you don’t then you should see if you can! You’re allowed to take for granted that there’s lambda calculus functions for basic arithmetic and if-then-else.
let F = lambda f, n: if n == 0 then 1 else n*f(n-1)

Now, fact = fix F. Why? Well, because if fact(n-1) = (n-1)! then F(fact)(n) = n*(n-1)! = n!. But F(fact)(n) = fact(n) so this means that fact(n-1) = (n-1)! implies fact(n) = n!, and then we can prove fact works on all integers by induction.
4:
For short I’ll define Bool := {T,F}.
If we had a surjective computable function f: S → Comp(S, Bool), then this means f’: S^2 → Bool is in Comp(S^2, Bool).

Let’s try to adapt the previous exercise. For any s, f(s) is a program from strings to true/false. Let’s ask whether it says itself is true: f(s)(s). Let’s then invert it to make h(s) = Not(f(s)(s)). Now, f(s)(s) = f’(s,s) so this is h(s) = Not( f’(s,s) ) which is computable since f’ is.

Now since f is surjective, there’s some s’ s.t. f(s’) = h, but then f(s’)(s’) = h(s’) = Not(f’(s’,s’)) = Not(f(s’)(s’)) but Not has no fixed points.
5:
Imagine we had a computable halt function. We would like to build a surjective computable function f: S → Comp(S, Bool).

For all r, we can uncurry halt to get halt_r(s) = halt(r,s) which is thus computable for all r. Therefore we have a function S → Comp(S, Bool).

Is it surjective? Well, what we want to know is if, given a program r assigning truth values to strings (that is an element of Comp(S, Bool)), can we find a program s such that s halts on exactly the inputs that r says it should halt on?

Well, when you say it like that, it’s really easy! We can make a program that runs r on the input, returns if it’s true, else loops forever. In other words, given any computable set, we can make a program whose set of halting inputs is that set.

Therefore, the uncurried halt has to be uncomputable.
Bonus: Rice’s theorem (surprisingly easy, though it took me a while):
Suppose P is a property/set of programs that’s nontrivial (there’s a program without the property and a program with it) and extensional (it only depends on what the partial computable function that the code evaluates to does). Then deciding whether a program is in P is uncomputable,
Let Y/N be two programs that are/aren’t in P. Now let’s ask whether the program L = “loop forever” is in P. And say that “foo{s}bar” for a string s is the string with s appearing between “foo” and “bar” (instead of literal character s), like a format string.

If it does:
Given any program q and input i, ask whether
r = “lambda x: dummy = {q}({i}); return {N}(x)”
is in P

If q(i) halts then eval(r) = eval(N), so r isn’t in P. Else eval(r) = eval(L) so r must be in P.

So you can make a computable function that makes the string r and feeds it to your P-detector.

Likewise if L doesn’t have the property, you can use Y.
6:
Suppose we have X, f such that f: X → C(X, S) and f a continuous surjection. Then we can look at h(x) = g(f(x)(x)) for some continuous g: S → S we’ll try to define later. Then h is continuous, so there’s some x’ such that f(x’) = h, but f(x’)(x’) = h(x’) = g(f(x’)(x’)). This tells up we want a g without fixed points. We can just rotate the circle a little to get our desired g.

Note that, for example, this proof also works if instead of the circle we had the torus, or the cylinder, or (I think) the klein bottle. Likewise, it’ll work for the sphere. But it won’t work for the disk, by the Brouwer fixed point theorem.
7&8:
Theory: Had to look this one up. But, I then spent a while manipulating it until it made sense.
Firstly, a digression. Imagine we had a total computable function F. Then since we have eval: N → (N → N), we might try to find a fix point by diagonalizing with h(x) = F(eval(x)(x)). The problem we’ll run into, however, is that we’ll get something that might not exist (due to not halting).

What if we only wanted an extensional fixed point, that is, a program x such that eval(e) = eval(F(e)) (as in, they encode the same computable function)?

Well, now we could try taking g(x) to be some code that’s extensionally equivalent to eval(x)(x). That is, eval(g(x))(y) = eval(eval(x)(x))(y). This is easy to do: g(x) = “lambda y: eval(eval({x})({x}))(y)”, and we can define the computable function g that returns this string with the input subbed in for x.

Now, let h(x) = F(g(x)). That is, let h = F compose g. Since this is computable, there’s some program e such that eval(e)(x) = h(x)
namely e = “lambda x: [F]([g](x))”

To reiterate: we have
eval(g(x))(y) = eval(eval(x)(x))(y)
eval(e)(x) = h(x)

And now eval(e)(e) = h(e) = F(g(e)).

Now, g(e) is not literally equal to eval(e)(e). But it’s extenisonally equal! And so we are done.

Another lens to view this: imagine the Y combinator Y f = (lambda x: x x)(lambda x: f(x x))
(after one application this combinator is the same as the usual Y f = (lambda x: f(x x))(lambda x: f(x x))

Imagine that every time you see a function application like x x, you replace it with eval(x)(x). Our h(x) is like eval(x)(x), our h is like (lambda x: x x), our e is like lambda x: f(x x), and so h(e) is Y f which is equal to f (Y f) = f (h(e))
So, we’ve just proved Roger’s theorem: every total computable function has an extensional fixed point.

Now we’ll prove the Kleene recursion theorem: given any computable Q of two variables, there’s some program p such that eval(p) = lambda y: Q(p, y).

This is easy with Roger’s theorem: just define F so that eval(F(p))(y) = Q(p,y), e.g. via F = lambda p: “lambda y: [Q]({p},y)”

For example if Q(p,y) = p then we have a quine. If Q(p,y) = f(p) then we have a program that when evaluated (given any argument) returns the function f applied to its code. Kleene’s recursion theorem allows us to feed in a second argument that gets passed to f.
Practice: I also had to look up how to make a quine :c
The idea is to have a variable that stores the text of your program, but with a to be filled in template in the part that defines the variable. Then, print(fill_in(variable))

(I think LW’s code blocks in spoiler tags aren’t working right, so not putting this in a code block)

chr(39) is the apostrophe
a = ‘a = {}{}{}; print(a.format(chr(39), a, chr(39)))’; print(a.format(chr(39), a, chr(39)))

To apply a function, just do very similar:

a = ‘a = {}{}{}; f = lambda s: “i\‘m ” + str(len(s)) + ” long”; print(f(a.format(chr(39), a, chr(39))))‘; f = lambda s: “i\’m ” + str(len(s)) + ” long”; print(f(a.format(chr(39), a, chr(39))))

Which should dutifully report that it’s 190 long
9: This suddenly became obvious a few days later—not sure what was holding me up. Also, what does “Syn” stand for?
We have f: S_1 → (S_1 → S_0), given by sending a P in S_1 to the function that sends any Q in S_1 to P([Q]) (where this means we substitute the godel number of Q into P). It’s surjective (onto Syn(S_1, S_0)), because an element of the codomain is given by substituting Q into some fixed formula, which we can take as our P, and then f(P) is what it should be.

Now, what should our map S_0 → S_0 be? Well, we want to find a fixed point of some P in S_1… so… why don’t we use P? All we gotta do is say that P sends a formula Q in S_0 to P([Q])!

So, now we’d like h(Q) = P([ f(Q)([Q]) ]) as usual. What we really need, though, is that we have a substitution function f’ that takes in two godel numbers n,m and returns the godel number of the formula with m plugged into the formula corresponding to n. We can do this by (handwaves) doing arithmetic. So then what we really have is h(Q) = P(f’([Q], [Q])).

Then h is in Syn(S_1, S_0) because we are just substituting Q into this formula. So there’s some R such that f(R)([R]) = P(f’([R], [R])) so if we have psi = R([R]) then we have psi = P([psi])

I’m not exactly sure why I have equality when I should only have logical equivalence. Looking at the wikipedia page, it looks like instead of having a formula f’, what we really have is a formula d(Q,y) <-> y = f’(Q,Q), and then instead of my h(Q) we have h(Q) = \exists y : d(Q,y) and P(y). The reason you need to do this is that you can basically code up a turing machine that does the substitution, but in order to represent it in logic you need to say something like “this program has output equal to y”.
10: “Bew”? I’ll just say Prove. I’ll also elide the quoting in the prove operator. You should imagine I put it there.

Anyways, when I first wrote this I kept getting ‘proofs’ that PA proves itself inconsistent. The error was that
we do not have |- P → Prove(P) (that is, PA doesn’t prove it’s complete). We merely have (|- P) → (|- Prove(P)) and |- Prove(P) → Prove(Prove(P)) .
Define F to be your favorite contradictory statement, like ((1=1) and (1!=1)). I’ll also use |- (‘turnstile’) for “PA proves...”.
By the previous exercise, -Prove must have some fixed point. Call it P. Then |- -Prove(P) <-> P

Now, let’s (in our metatheory) suppose |- P. We have:
|- P → -Prove(P)
|- Prove(P) (from metatheorem mentioned in next exercise)
|- P → F
which would make PA inconsistent.

If instead |- -P, then we’d have:
|- Prove(P)
|- Prove(-P) aka Prove(P → F)
|- Prove(F) (since |- Prove(A → B) → (Prove(A) → Prove(B)) )
If PA proves itself consistent, then this would mean it’s inconsistent.

Therefore if PA is consistent and PA proves itself consistent, then it cannot prove nor disprove P.

Now, we will show that if PA is consistent and proves itself consistent then it must prove P. Don’t ask me how I came up with this: I just filled out a napkin with failed approaches until this proof mysteriously appeared.
|- -Prove(P) <-> P
|- Prove(-P <-> Prove(P))
|- Prove(-P) <-> Prove(Prove(P))
|- Prove(P) → Prove(Prove(P))
|- Prove(P) → Prove(-P)
|- Prove(-P) → (Prove(P) → Prove(F))
|- Prove(P) → (Prove(P) → Prove(F))
|- Prove(P) → Prove(F) [this will be used in the next exercise—note we haven’t yet used -Prove(F)]
Now, since we assumed |- -Prove(F), we have
|- -Prove(P)
but then
|- P

which is where the previous part comes in to show that then PA is inconsistent.

If we believe (aka the metatheory proves) PA is consistent, then we believe PA can’t prove itself consistent. Secondly since we think PA can’t prove a contradiction, we also think that

I think the fundamental idea is to get Prove(P) ‘through some other means’, and specifically here we apply Prove to (thing that takes in Prove(P)) to get an expression that itself has Prove(P), and so we can go back up and substitute it into the machine.

That is, think of the constructive version of the fixpoint theorem: there’s a function that takes in Prove(P) and spits out something with P in it. By applying Prove to this function, we get something that itself has a “essentially new” Prove(P) in there, which allows us to apply the function to get something useful.
11:
Suppose we have |- Prove(P) → P. By the diagonal lemma we know there’s some Q such that
|- Q <-> (Prove(Q) → Prove(P))
|- Q <-> (Prove(Q) → P) [“Q is true iff a proof of it implies P”]
|- Prove(Q) → (Prove(Prove(Q)) → Prove(P)) [“I can prove Q iff I can use a proof that I can to prove P”]
|- Prove(Q) → Prove(P)
|- Q
|- Prove(Q)
|- Prove(P)
|- P
12: This was much much easier than I assumed it would be.
Say F is our phi, so that for all P, |- F(P) <-> P holds. Then by the diagonal lemma, there’s a P s.t.
|- -F(P) <-> P
But then |- P <-> F(P) gives us a contradiction.
13: TODO