This is the third of three sets of fixed point exercises. The first post in this sequence is here, giving context.

Note: Questions 1-5 form a coherent sequence and questions 6-10 form a separate coherent sequence. You can jump between the sequences.

Let (X,d) be a complete metric space. A function f:X→X is called a contraction if there exists a q<1 such that for all x,y∈X, d(f(x),f(y))≤q⋅d(x,y). Show that if f is a contraction, then for any x, the sequence {xn=fn(x0)} converges. Show further that it converges exponentially quickly (i.e. the distance between the nth term and the limit point is bounded above by c⋅an for some a<1)

(Banach contraction mapping theorem) Show that if (X,d) is a complete metric space and f is a contraction, then f has a unique fixed point.

If we only require that d(f(x),f(y))<d(x,y) for all x≠y, then we say f is a weak contraction. Find a complete metric space (X,d) and a weak contraction f:X→X with no fixed points.

A function f:Rn→R is convex if f(tx+(1−t)y)≤tf(x)+(1−t)f(y), for all t∈[0,1] and x,y∈Rn. A function f is strongly convex if you can subtract a positive parabaloid from it and it is still convex. (i.e.f is strongly convex if x↦f(x)−ε||x||2 is convex for some ε>0.) Let f be a strongly convex smooth function from Rn to R, and suppose that the magnitude of the second derivative ∥∇2f∥ is bounded. Show that there exists an ε>0 such that the function g:Rn→Rn given by x↦x−ε(∇f)(x) is a contraction. Conclude that gradient descent with a sufficiently small constant step size converges exponentially quickly on a strongly convex smooth function.

A finite stationary Markov chain is a finite set S of states, along with probabilistic rule A:S→ΔS for transitioning between the states, where ΔS represents the space of probability distributions on S. Note that the transition rule has no memory, and depends only on the previous state. If for any pair of states s,t∈ΔS, the probability of passing from s to t in one step is positive, then the Markov chain (S,A) is ergodic. Given an ergodic finite stationary Markov chain, use the Banach contraction mapping theorem to show that there is a unique distribution over states which is fixed under application of transition rule. Show that, starting from any state s, the limit distribution limn→∞An(s) exists and is equal to the stationary distribution.

A function f from a partially ordered set to another partially ordered set is called monotonic if x≤y implies that f(x)≤f(y). Given a partially ordered set (P,≤) with finitely many elements, and a monotonic function from P to itself, show that if f(x)≥x or f(x)≤x, then fn(x) is a fixed point of f for all n>|P|.

A complete lattice (L,≤) is a partially ordered set in which each subset of elements has a least upper bound and greatest lower bound. Under the same hypotheses as the previous exercise, extend the notion of fn(x) for natural numbers n to fα(x) for ordinals α, and show that fα(x) is a fixed point of f for all x∈X with f(x)≤x or f(x)≥x and all |α|>|L| (|A|≤|B| means there is an injection from A to B, and |A|>|B| means there is no such injection).

(Knaster-Tarski fixed point theorem) Show that the set of fixed points of a monotonic function on a complete lattice themselves form a complete lattice. (Note that since the empty set is always a subset, a complete lattice must be nonempty.)

Show that for any set A, (P(A),⊆) forms a complete lattice, and that any injective function from A to B defines a monotonic function from (P(A),⊆) to (P(B),⊆). Given injections f:A→B and g:B→A, construct a subset A′ of A and a subset of B′ of B such that B′=f(A′) and A−A′=g(B−B′).

(Cantor–Schröder–Bernstein theorem) Given sets A and B, show that if |A|≤|B| and |A|≥|B|, then |A|=|B|. (|A|≤|B| means there is an injection from A to B, and |A|=|B| means there is a bijection)

Please use the spoilers feature—the symbol ‘>’ followed by ‘!’ followed by space -in your comments to hide all solutions, partial solutions, and other discussions of the math. The comments will be moderated strictly to hide spoilers!

I recommend putting all the object level points in spoilers and including metadata outside of the spoilers, like so: “I think I’ve solved problem #5, here’s my solution <spoilers>” or “I’d like help with problem #3, here’s what I understand <spoilers>” so that people can choose what to read.

Tomorrow’s AI Alignment Forum Sequences post will be “Approval-directed agents: overview” by Paul Christiano in the sequence Iterated Amplification.

The next post in this sequence will be released on Saturday 24th November, and will be ‘Fixed Point Discussion’.

If at any point fn(b)=fn−1(x), then we’re done. So assume that we get a strict increase each time up to n=|P|. Since there are only |P| elements in the entire poset, and f is monotone, fn+1(x) has to equal fn(x).

Ex 7:

For a limit ordinal α, define fα(x) as the least upper bound of fn(x) for all n<α. If α>|L|, then the set fn(x) for n<α is a set of size α that maps into a set of size L by taking the value of the element. Since there are no injections between these sets, there must be two ordinals n<m such thatfn(x)=fm(x). Since f is monotone, that implies that for every ordinal l>n, fl(x)=fn(x)and thus is a fixed point. Since n<α this proves the exercise.

Ex 8:

Starting from x, we can create a fixed point via iteration by taking α>|L|, and iterating α times as demonstrated in Ex 7. Call this fixed point fx. Suppose there was a fixed point k such that x≤k and k≤fx. Then at some point fn(x)≤fn(k)=k, but fn+1(x)≥fn+1(k)=k, which breaks the monotonicity of f unless k=fx. So fx generated this way is always the smallest fixed point greater than x.

Say we have fixed points xi. Then let x be the least upper bound of xi, and generate a fixed point from fx. So fx will be greater than each element of xi since f is monotone, and is the smallest such fixed point as shown in the above paragraph. So the poset of fixed points is semi-complete with upper bounds.

Now take our fixed points xi again. Now let x be the greatest lower bound of xi, and generate a fixed point fx. Since x≤xi and f is monotonic, fα(x)≤fα(xi)=xi, and so fx is a lower bound of xi. It has to be the greatest such bound because x itself is already the greatest such bound in our poset, and f is monotonic.

Thus the lattice of fixed points has all least upper bounds and all greatest lower bounds, and is thus complete!

Let xi+1=f(xi) for arbitrary x0. Call c=d(x0,x1). Then by induction (i<j)d(xi,xj)≤∑k=j−1k=id(xk,xk+1)≤∑k=j−1k=icqk≤cqi1−q (power series simplification)

Therefore ∀δ>0:∃n∈N∀i>n,j>i:d(i,j)<cqn1−q<δ ie xi is a cauchy sequence. However (X,d) is said to be complete, which by definition means any cauchy sequence is convergent. So xn→y and d(xi,y)≤sup∞j=id(xi,xj)≤cqi1−q So xn converges exponentially quickly

Answer to question 2.

From part 1, as f is continuous, y=limn→∞(f(xn+1))=limn→∞(f(xn))=f(limn→∞(xn))=f(y) So y is a fixed point. Suppose x and y are both fixed points of f(x) a contraction map. Then f(x)=x and f(y)=y so d(f(x),f(y))≤qd(x,y)=qd(f(x),f(y)) therefore d(x,y)=0 so x=y. Thus f has a unique fixed point.

Answer to question 3.

(R,d(x,y)=|x−y|) is a metric space. Its the real line with normal distance. Let f(x)=√1+x2 . Then f is a contraction map because f is differentiable and f′(x)=x√1+x2has the property ∀x:|f′(x)|<1. However no fixed point exists as ∀x:f(x)>x. This works because the sequence xi generated from repeated applications of f will tend to infinity, despite successive terms becoming ever closer.

Assume WLOG f(x)>=xThen by monotonicity, we have x<=f(x)<=f2(x)<=...<=f|P|(x)If this chain were all strictly greater, than we would have |P|+1istinct elements. Thus there must be some kuch that fk(x)=fk+1(x)By induction, fn+1(x)=fn(x)=fk(x)or all n>k

#7:

Assume f(x)>=xnd construct a chain similarly to (6), indexed by elements of αIf all inequalities were strict, we would have an injection from αo L.

#8:

Let F be the set of fixed points. Any subset S of F must have a least upper bound xn L. If x is a fixed point, done. Otherwise, consider fα(x) which must be a fixed point by (7). For any q in S, we have f(q)≤x⇒fα(q)≤fα(x)⇒q≤fα(x) Thus fα(x)s an upper bound of S in F. To see that it is the least upper bound, assume we have some other upper bound b of S in F. Then x<=b⇒fα(x)<=fα(b)=b

To get the lower bound, note that we can flip the inequalities in L and still have a complete lattice.

#9:

P(A) clearly forms a lattice where the upper bound of any set of subsets is their union, and the lower bound is the intersection.

To see that injections are monotonic, assume A0⊆A1nd fs an injection. For any function, f(A0)⊆f(A1) If a∉A0nd f(a)∈f(A0)that implies f(a)=f(a′)or some a′∈A0which is impossible since fs injective. Thus fs (strictly) monotonic.

Now h:=g∘fs an injection A→ALet Xe the set of all points not in the image of gand let A′=X∪h(X)∪h2(X)∪...ote that h(A′)=h(X)∪h2(X)∪h3(X)∪...=A′−Xsince no element of Xs in the image of hThen g(B−f(A′))=g(B)−h(A′)=g(B)−(A′−X)=g(B)−A′+g(B)∩X=g(B)−A′On one hand, every element of A not contained in g(B)s in A′y construction, so A−A′⊆g(B) On the other, clearly g(B)⊆Aso g(B)−A′⊆A−A′QED.

#10:

We form two bijections using the sets from (9), one between A’ and B’, the other between A—A’ and B—B’.

Any injection is a bijection between its domain and image. Since B′=f(A′)nd fs an injection, fs a bijection where we can assign each element b′∈B′o the a′∈A′uch that f(a′)=b′Similarly, gs a bijection between B−B′nd A−A′Combining them, we get a bijection on the full sets.

We wish to show that the terms of xn form a Cauchy sequence, which suffices to demonstrate they converge in a complete space. Take m,n∈N+, and WLOG m<n. Then we know from the definition of contraction that d(xm,xn)≤qm⋅d(x0,xn−m). This converges to 0 as m increases, so the sequence is Cauchy.

It’s easy to see that this makes the rate of convergence between terms of the Cauchy sequence exponentially quick. Intuitively that seems like it ought to make the sequence converge to its limit with the same speed, but I don’t think that can be made rigorous without more steps.

Q2

Take a sequence {xn=fn(x0)}. This converges to some L. Suppose L was not a fixed point. Then choose an ϵ=(L−f(L))/10 . A sequence {xn} which converges to a limit has, for every ϵ, some N such that ∀n>=N:|xn−L|<ϵ. Then we know that d(xN,L)<ϵ but d(f(xN),f(L))>ϵ , contradicting the contraction condition. So there is at least one fixed point, L.

Suppose there are two fixed points, f(x)=x, f(y)=y for distinct x and y. If so, d(f(x),f(y))=d(x,y), which again contradicts the contraction condition. So there is at most one fixed point.

Q3

Take as the space {n∈R+:n≥1}, with the usual metric. Define f(x)=x2+1x. This is a weak contraction (toward infinity) and has no fixed points within this space.

## Iteration Fixed Point Exercises

This is the third of three sets of fixed point exercises. The first post in this sequence is here, giving context.

Note: Questions 1-5 form a coherent sequence and questions 6-10 form a separate coherent sequence. You can jump between the sequences.

Let (X,d) be a complete metric space. A function f:X→X is called a contraction if there exists a q<1 such that for all x,y∈X, d(f(x),f(y))≤q⋅d(x,y). Show that if f is a contraction, then for any x, the sequence {xn=fn(x0)} converges. Show further that it converges exponentially quickly (i.e. the distance between the nth term and the limit point is bounded above by c⋅an for some a<1)

(Banach contraction mapping theorem) Show that if (X,d) is a complete metric space and f is a contraction, then f has a unique fixed point.

If we only require that d(f(x),f(y))<d(x,y) for all x≠y, then we say f is a weak contraction. Find a complete metric space (X,d) and a weak contraction f:X→X with no fixed points.

A function f:Rn→R is convex if f(tx+(1−t)y)≤tf(x)+(1−t)f(y), for all t∈[0,1] and x,y∈Rn. A function f is strongly convex if you can subtract a positive parabaloid from it and it is still convex. (i.e.f is strongly convex if x↦f(x)−ε||x||2 is convex for some ε>0.) Let f be a strongly convex smooth function from Rn to R, and suppose that the magnitude of the second derivative ∥∇2f∥ is bounded. Show that there exists an ε>0 such that the function g:Rn→Rn given by x↦x−ε(∇f)(x) is a contraction. Conclude that gradient descent with a sufficiently small constant step size converges exponentially quickly on a strongly convex smooth function.

A finite stationary Markov chain is a finite set S of states, along with probabilistic rule A:S→ΔS for transitioning between the states, where ΔS represents the space of probability distributions on S. Note that the transition rule has no memory, and depends only on the previous state. If for any pair of states s,t∈ΔS, the probability of passing from s to t in one step is positive, then the Markov chain (S,A) is ergodic. Given an ergodic finite stationary Markov chain, use the Banach contraction mapping theorem to show that there is a unique distribution over states which is fixed under application of transition rule. Show that, starting from any state s, the limit distribution limn→∞An(s) exists and is equal to the stationary distribution.

A function f from a partially ordered set to another partially ordered set is called monotonic if x≤y implies that f(x)≤f(y). Given a partially ordered set (P,≤) with finitely many elements, and a monotonic function from P to itself, show that if f(x)≥x or f(x)≤x, then fn(x) is a fixed point of f for all n>|P|.

A complete lattice (L,≤) is a partially ordered set in which each subset of elements has a least upper bound and greatest lower bound. Under the same hypotheses as the previous exercise, extend the notion of fn(x) for natural numbers n to fα(x) for ordinals α, and show that fα(x) is a fixed point of f for all x∈X with f(x)≤x or f(x)≥x and all |α|>|L| (|A|≤|B| means there is an injection from A to B, and |A|>|B| means there is no such injection).

(Knaster-Tarski fixed point theorem) Show that the set of fixed points of a monotonic function on a complete lattice themselves form a complete lattice. (Note that since the empty set is always a subset, a complete lattice must be nonempty.)

Show that for any set A, (P(A),⊆) forms a complete lattice, and that any injective function from A to B defines a monotonic function from (P(A),⊆) to (P(B),⊆). Given injections f:A→B and g:B→A, construct a subset A′ of A and a subset of B′ of B such that B′=f(A′) and A−A′=g(B−B′).

(Cantor–Schröder–Bernstein theorem) Given sets A and B, show that if |A|≤|B| and |A|≥|B|, then |A|=|B|. (|A|≤|B| means there is an injection from A to B, and |A|=|B| means there is a bijection)

Please use the spoilers feature—the symbol ‘>’ followed by ‘!’ followed by space -in your comments to hide all solutions, partial solutions, and other discussions of the math. The comments will be moderated strictly to hide spoilers!I recommend putting all the object level points in spoilers and including metadata outside of the spoilers, like so: “I think I’ve solved problem #5, here’s my solution <spoilers>” or “I’d like help with problem #3, here’s what I understand <spoilers>” so that people can choose what to read.Tomorrow’s AI Alignment Forum Sequences post will be “Approval-directed agents: overview” by Paul Christiano in the sequence Iterated Amplification.The next post in this sequence will be released on Saturday 24th November, and will be ‘Fixed Point Discussion’.Ex 6:

If at any point fn(b)=fn−1(x), then we’re done. So assume that we get a strict increase each time up to n=|P|. Since there are only |P| elements in the entire poset, and f is monotone, fn+1(x) has to equal fn(x).

Ex 7:

For a limit ordinal α, define fα(x) as the least upper bound of fn(x) for all n<α. If α>|L|, then the set fn(x) for n<α is a set of size α that maps into a set of size L by taking the value of the element. Since there are no injections between these sets, there must be two ordinals n<m such thatfn(x)=fm(x). Since f is monotone, that implies that for every ordinal l>n, fl(x)=fn(x)and thus is a fixed point. Since n<α this proves the exercise.

Ex 8:

Starting from x, we can create a fixed point via iteration by taking α>|L|, and iterating α times as demonstrated in Ex 7. Call this fixed point fx. Suppose there was a fixed point k such that x≤k and k≤fx. Then at some point fn(x)≤fn(k)=k, but fn+1(x)≥fn+1(k)=k, which breaks the monotonicity of f unless k=fx. So fx generated this way is always the smallest fixed point greater than x.

Say we have fixed points xi. Then let x be the least upper bound of xi, and generate a fixed point from fx. So fx will be greater than each element of xi since f is monotone, and is the smallest such fixed point as shown in the above paragraph. So the poset of fixed points is semi-complete with upper bounds.

Now take our fixed points xi again. Now let x be the greatest lower bound of xi, and generate a fixed point fx. Since x≤xi and f is monotonic, fα(x)≤fα(xi)=xi, and so fx is a lower bound of xi. It has to be the greatest such bound because x itself is already the greatest such bound in our poset, and f is monotonic.

Thus the lattice of fixed points has all least upper bounds and all greatest lower bounds, and is thus complete!

#3:

x→x+1x on x≥1 shortens all distances but is strictly monotonic.

#6: (the “show that if” condition follows from the property, the question is likely misstated)

The iteration is so long that it must visit an element twice. We can’t have a cycle in the order so the repetition must be immediate.

Thanks, I actually wanted to get rid of the earlier condition that f(x)≥x for all x, and I did that.

Answer to question 1.

Let xi+1=f(xi) for arbitrary x0. Call c=d(x0,x1). Then by induction (i<j)d(xi,xj)≤∑k=j−1k=id(xk,xk+1)≤∑k=j−1k=icqk≤cqi1−q (power series simplification)

Therefore ∀δ>0:∃n∈N∀i>n,j>i:d(i,j)<cqn1−q<δ ie xi is a cauchy sequence. However (X,d) is said to be complete, which by definition means any cauchy sequence is convergent. So xn→y and d(xi,y)≤sup∞j=id(xi,xj)≤cqi1−q So xn converges exponentially quickly

Answer to question 2.

From part 1, as f is continuous, y=limn→∞(f(xn+1))=limn→∞(f(xn))=f(limn→∞(xn))=f(y) So y is a fixed point. Suppose x and y are both fixed points of f(x) a contraction map. Then f(x)=x and f(y)=y so d(f(x),f(y))≤qd(x,y)=qd(f(x),f(y)) therefore d(x,y)=0 so x=y. Thus f has a unique fixed point.

Answer to question 3.

(R,d(x,y)=|x−y|) is a metric space. Its the real line with normal distance. Let f(x)=√1+x2 . Then f is a contraction map because f is differentiable and f′(x)=x√1+x2has the property ∀x:|f′(x)|<1. However no fixed point exists as ∀x:f(x)>x. This works because the sequence xi generated from repeated applications of f will tend to infinity, despite successive terms becoming ever closer.

For Q2, I believe you aren’t done:

You have established that there is at most one fixed point, but not that a fixed point exists.

#6:

Assume WLOG f(x)>=xThen by monotonicity, we have x<=f(x)<=f2(x)<=...<=f|P|(x)If this chain were all strictly greater, than we would have |P|+1istinct elements. Thus there must be some kuch that fk(x)=fk+1(x)By induction, fn+1(x)=fn(x)=fk(x)or all n>k

#7:

Assume f(x)>=xnd construct a chain similarly to (6), indexed by elements of αIf all inequalities were strict, we would have an injection from αo L.

#8:

Let F be the set of fixed points. Any subset S of F must have a least upper bound xn L. If x is a fixed point, done. Otherwise, consider fα(x) which must be a fixed point by (7). For any q in S, we have f(q)≤x⇒fα(q)≤fα(x)⇒q≤fα(x) Thus fα(x)s an upper bound of S in F. To see that it is the least upper bound, assume we have some other upper bound b of S in F. Then x<=b⇒fα(x)<=fα(b)=b

To get the lower bound, note that we can flip the inequalities in L and still have a complete lattice.

#9:

P(A) clearly forms a lattice where the upper bound of any set of subsets is their union, and the lower bound is the intersection.

To see that injections are monotonic, assume A0⊆A1nd fs an injection. For any function, f(A0)⊆f(A1) If a∉A0nd f(a)∈f(A0)that implies f(a)=f(a′)or some a′∈A0which is impossible since fs injective. Thus fs (strictly) monotonic.

Now h:=g∘fs an injection A→ALet Xe the set of all points not in the image of gand let A′=X∪h(X)∪h2(X)∪...ote that h(A′)=h(X)∪h2(X)∪h3(X)∪...=A′−Xsince no element of Xs in the image of hThen g(B−f(A′))=g(B)−h(A′)=g(B)−(A′−X)=g(B)−A′+g(B)∩X=g(B)−A′On one hand, every element of A not contained in g(B)s in A′y construction, so A−A′⊆g(B) On the other, clearly g(B)⊆Aso g(B)−A′⊆A−A′QED.

#10:

We form two bijections using the sets from (9), one between A’ and B’, the other between A—A’ and B—B’.

Any injection is a bijection between its domain and image. Since B′=f(A′)nd fs an injection, fs a bijection where we can assign each element b′∈B′o the a′∈A′uch that f(a′)=b′Similarly, gs a bijection between B−B′nd A−A′Combining them, we get a bijection on the full sets.

Q1

We wish to show that the terms of xn form a Cauchy sequence, which suffices to demonstrate they converge in a complete space. Take m,n∈N+, and WLOG m<n. Then we know from the definition of contraction that d(xm,xn)≤qm⋅d(x0,xn−m). This converges to 0 as m increases, so the sequence is Cauchy.

It’s easy to see that this makes the rate of convergence between terms of the Cauchy sequence exponentially quick. Intuitively that seems like it ought to make the sequence converge to its limit with the same speed, but I don’t think that can be made rigorous without more steps.

Q2

Take a sequence {xn=fn(x0)}. This converges to some L. Suppose L was not a fixed point. Then choose an ϵ=(L−f(L))/10 . A sequence {xn} which converges to a limit has, for every ϵ, some N such that ∀n>=N:|xn−L|<ϵ. Then we know that d(xN,L)<ϵ but d(f(xN),f(L))>ϵ , contradicting the contraction condition. So there is at least one fixed point, L.

Suppose there are two fixed points, f(x)=x, f(y)=y for distinct x and y. If so, d(f(x),f(y))=d(x,y), which again contradicts the contraction condition. So there is at most one fixed point.

Q3

Take as the space {n∈R+:n≥1}, with the usual metric. Define f(x)=x2+1x. This is a weak contraction (toward infinity) and has no fixed points within this space.