# Dacyn(David Simmons)

Karma: 375
• FTL communication is not ruled out by the Schrodinger equation, but this is irrelevant because the Schrodinger equation is not valid for systems which include fast-moving particles. Instead, you have to use quantum field theory, of which the Schrodinger equation is the limit as the speed of light approaches infinity. In QFT, FTL communication is indeed ruled out by the formalism, as you suggest. Specifically, it’s the commutativity or anticommutativity of field operators based at points which are spacelike separated that does it. For further details I would suggest reading the short paper of Eberhard and Ross. (Unfortunately you need an institutional affiliation to view the link, but I can send a PDF to anyone who wants it.)

• I’d thought the Hilbert space was uncountably dimensional because the number of functions of a real line is uncountable. But in QM it’s countable… because everything comes in multiples of Planck’s constant, perhaps? Though I haven’t seen the actual reason stated, and perhaps it’s something beyond my current grasp.

Ahh… here’s something I can help with. To see why Hilbert space has a countable basis, let’s first define Hilbert space. So let

$L^2$
= the set of all functions
$f$
such that the integral of
$|f|^2$
is finite, and let

$N$
= the set of all functions such that the integral of
$|f|^2$
is zero. This includes for example the Dirichlet function which is one on rational numbers but zero on irrational numbers. So it’s actually a pretty big space.

Hilbert space is defined to be the quotient space

$L^2/N$
. To see that it has a countable basis, it suffices to show that it contains a countable dense set. Then the Gram-Schmidt orthogonalization process can turn that set into a basis. What does it mean to say that a set is dense? Well, the metric on Hilbert space is given by the formula

$dist\(f,g\$
%20=%20\sqrt{\int%20|f%20-%20g|%5E2}),

so a sequence is dense if for every element

$f$
of Hilbert space, you can find a sequence
$f\_n$
such that
$dist\(f\_n,f\$
%20\to%200). Now we can see why we needed to mod out by
$N$
-- any two points of
$N$
are considered to have distance zero from each other!

So what’s a countable dense sequence? One sequence that works is the sequence of all piecewise-linear continuous functions with finitely many pieces whose vertices are rational numbers. This class includes for example the function defined by the following equations:

$f\(x\$
%20=%200) for all
$x < \-1/2$

$f\(x\$
%20=%20(1/​2%20+%20x)/​3) for all
$\-1/2 < x < 0$

$f\(x\$
%20=%20(1/​2%20-%20x)/​3) for all
$0 < x < 1/2$

$f\(x\$
%20=%200) for all
$x > 1/2$

Note that I don’t need to specify what

$f$
does if I plug in a number in the finite set
$\\\{\-1/2,0,1/2\\\}$
, since any function
$g$
which is zero outside of that set is an element of
$N$
, so
$f \+ g$
would represent the same element of Hilbert space as
$f$
.

So to summarize:

1. The uncountable set that you would intuitively think is a basis for Hilbert space, namely the set of functions which are zero except at a single value where they are one, is in fact not even a sequence of distinct elements of Hilbert space, since all these functions are elements of

$N$
, and are therefore considered to be equivalent to the zero function.

2. The actual countable basis for Hilbert space will look much different, and the Gram-Schmidt process I alluded to above doesn’t really let you say exactly what the basis looks like. For Hilbert space over the unit interval, there is a convenient way to get around this, namely Parseval’s theorem, which states that the sequences

$f\_n\(x\$
%20=%20\cos(2\pi%20nx)) and
$g\_n\(x\$
%20=%20\sin(2\pi%20nx)) form a basis for Hilbert space. For Hilbert space over the entire real line, there are some known bases but they aren’t as elegant, and in practice we rarely need an explicit countable basis.

3. Finally, the philosophical aspect: Having a countable basis means that elements of Hilbert space can be approximated arbitrarily well by elements which take only a finite amount of information to describe*, much like real numbers can be approximated by rational numbers. This means that an infinite set atheist should be much more comfortable with countable-basis Hilbert space than with uncountable-basis Hilbert space, where such approximation is impossible.

* The general rule is:

Elements of a finite set require a finite and bounded amount of information to describe.

Elements of a countable set require a finite but unbounded amount of information to describe.

Elements of an uncountable set (of the cardinality of the continuum) require a countable amount of information to describe.

• From the decoherent point of view, the no-communication theorem is fairly simple (if you are comfortable with tensor products*). Suppose that Alice and Bob are studying the two quantum systems

$A$
and
$B$
, whose state spaces are represented by Hilbert spaces
$H\_A$
and
$H\_B$
, respectively. Then the state space of the joint system is
$H := H\_A \\otimes H\_B$
. Now suppose that Alice makes a measurement on** system
$A$
, and Bob makes a measurement on system
$B$
. These measurements are represented physically by unitary transformations
$U\_A:H\_A\\rightarrow H\_A$
and
$U\_B: H\_B\\rightarrow H\_B$
. The effect of the measurements on the joint system are therefore represented by the unitary transformations
$V\_A = U\_A \\otimes I\_B$
and
$V\_B = I\_A \\otimes U\_B$
, where
$I\_A$
and
$I\_B$
are the identity transformations on
$H\_A$
and
$H\_B$
, respectively. The key to the no-communication theorem is the observation that the transformations
$V\_A$
and
$V\_B$
commute with each other. (Either way you take the product you get
$U\_A \\otimes U\_B$
.) It implies that if we do our calculations assuming that Alice did her measurement first, then we will get the same answers as if we do our calculations assuming that Bob did his measurement first. So let’s do our calculations assuming that Bob measured first, as it will be easier to analyze that way.

After Bob makes his measurement, the amplitude of the universe is split up into two blobs, one corresponding to Bob recording Possible Outcome 1 and another correspondint to Bob recording Possible Outcome 2. The size of these blobs, as measured by square-integrating, is independent of anything that Alice does (since according to this formulation of the problem, Alice hasn’t done anything yet). Now when Alice makes her measurement, the size of the blobs is preserved because of unitarity. Moreover (and this is the crucial point) the blob corresponding to Outcome 1 gets mapped to another blob corresponding to Outcome 1, and the blob corresponding to Outcome 2 gets mapped to another blob corresponding to Outcome 2. Thus, the final size of the blobs corresponding to the different outcomes is independent of Alice’s choice, and according to the Born probabilities that means Bob’s expectations about his measurement are also independent of Alice’s choice.

The fact that outcomes are preserved under Alice’s action is worth remarking further on. Intuitively, it corresponds to the fact that recorded measurements don’t erase themselves randomly. Scientifically, it corresponds to the complicated phenomenon known as decoherence, which is much harder to describe rigorously than the no-communication theorem is. Philosophically, it corresponds to the fact about the world that the Copenhagen interpretation thinks of as an assumption, and which many-worlders think too complicated to be considered a fundamental assumption of physics.

* For those not familiar with tensor products, they are the mathematical objects Eliezer is implicitly talking about whenever he writes things like “(Human-LEFT Sensor-LEFT Atom-LEFT) + (Human-RIGHT Sensor-RIGHT Atom-RIGHT)”. A working definition is that the tensor product of an M-dimensional space with an N-dimensional space is an MN-dimensional space.

** And/​or a modification to system

$A$
; the composition of any number of measurements and modifications will always be represented by a unitary transformation.

A final remark: The no-communication theorem, as I’ve sketched it above, shows that entangled but noninteracting particles cannot be used for distant communication. It says nothing about faster-than-light communication, as it does not make the connection between the ability of particles to interact and the speed of light, a connection which requires more formalism. The fact that FTL communication is impossible is a theorem of quantum field theory, the relativistic version of quantum mechanics. The basic idea is that the evolution operators corresponding to spacelike separated regions of spacetime will commute, allowing the above argument to take place with

$V\_A$
and
$V\_B$
replaced by more realistic operators.

• 4 Aug 2014 0:15 UTC
0 points

By the way, has anyone else noticed that math symbols don’t always work in LessWrong markup? I originally posted code which I had compiled from LaTeX to markup at the suggested website and then double-checked the markup output at http://​​markdownr.com/​​, but when I posted here there were errors which didn’t come up on either of the previous sites. (I think I’ve fixed all the errors now though...)

This would be a lot less annoying if it were possible to preview a comment before posting it...

• Good question! The Dirac delta distributions are a basis in a certain sense, but not in the sense that I was talking about in my previous comment (which is the sense in which mathematicians and physicists say that “the Hilbert space of quantum mechanics has a countable basis”). I realize now that I should have been more clear about what kind of basis I was talking about, which is an orthonormal basis—each element of the basis is a unit vector, and the lines spanned by distinct basis elements meet at right angles. Implicit in this formulation is the assumption that elements of the basis will be elements of Hilbert space. This is why the Dirac delta distributions are not a basis in this sense—they are not elements of Hilbert space; in fact they are not even functions but are rather generalized functions). Physicists also like to say that they are “nonrenormalizable” in the sense that “no scalar multiple of a delta function is a unit vector”—illustrating failure of the criterion of orthonormality in a more direct way.

The sense in which the Dirac delta distributions are a basis is that any element of Hilbert space can be written as a integral combination of them:

$f=\\int f\(x\$
\delta_x%20\;dx)

(Both sides of this equation are considered in the distributional sense, so what this formula really means is that for any function

$g$
,

$\\int fg=\\int f\(x\$
\left(\int%20g\delta_x\right)\;dx,)

which is a tautology.) This is of course a very different statement from the notion of orthonormal basis discussed above.

So what are some differences between these two notions of bases?

1. Orthonormal bases have the advantage that any two orthonormal bases have the same cardinality, allowing dimension to be defined consistently. By contrast, if one applies a Fourier transform to Hilbert space on [0,1], one gets Hilbert space on the integers; but the former has an uncountable basis of Dirac delta functions while the latter has a countable basis of Dirac delta functions. The Fourier transform is a unitary transformation, so intuitively that means it shouldn’t change the dimension (or other properties) of the Hilbert space. So the size of the Dirac delta basis is not a good way of talking about dimension.

2. Orthonormal bases take the point of view that Hilbert space is an abstract geometric object, whose properties are determined only by its elements and the distances between them as defined by the distance function I described in my previous comment. By contrast, Dirac delta bases only make sense when you go back and think of the elements of Hilbert space as functions again. Both these points of view can be useful. A big advantage of the abstract approach is that it means that unitary transformations will automatically preserve all relevant properties (e.g. Fourier transform preserving dimension as noted above).

So to summarize, both bases are useful, but the orthonormal basis is the right basis with respect with which to ask and answer the question “What is the dimension of Hilbert space?”

• The wiki link to the RationalWiki page reproducing Roko’s original post does not work for me. It works if I replace https://​ by http://​.

• Depending on your definition of “elegant”, there are probably no famous unsolved math problems with elegant proofs. For example, I would be surprised if any (current) famous unsolved math problems have proofs that could easily be understood by a lay audience.

• Riemannian geometry is not an axiomatic geometry in the same way that Euclidean geometry is, so it is not true that “the only difference is a single axiom.” I think you are thinking of hyperbolic geometry. In any case, the geometry of spacetime according to the theory of general relativity is not any of these geometries, but it is instead a Lorentzian geometry. (I say “a” because the words “Riemannian” and “Lorentzian” both refer to classes of geometries rather than a single geometry—for example, Euclidean geometry and hyperbolic geometry are both examples of Riemannian geometries.)

• Special relativity is good enough for most purposes, which means that (a time slice of) the real universe is very nearly Euclidean. So if you are going to explain the geometry of the universe to someone, you might as well just say “very nearly Euclidean, except near objects with very high gravity such as stars and black holes”.

I don’t think it’s helpful to compare with Euclid’s postulates, they reflect a very different way of thinking about geometry than modern differential geometry.

• From your post it sounds like you in fact do not have a clear picture of infinity in your head. I have a feeling this is true for many people, so let me try to paint one. Throughout this post I’ll be using “number” to mean “positive integer”.

Suppose that there is a distinction we can draw between certain types of numbers and other types of numbers. For example, we could make a distinction between “primes” and “non-primes”. A standard way to communicate the fact that we have drawn this distinction is to say that there is a “set of all primes”. This language need not be construed as meaning that all primes together can be coherently thought of as forming a collection (though it often is construed that way, usually pretty carelessly); the key thing is just that the distinction between primes and non-primes is itself meaningful. In the case of primes, the fact that the distinction is meaningful follows from the fact that there is an algorithm to decide whether any given number is prime.

Now for “infinite”: A set of numbers is called infinite if for every number N, there exists a number greater than N in the set. For example, Euclid proved that the set of primes is infinite under this definition.

Now this definition is a little restrictive in terms of mathematical practice, since we will often want to talk about sets that contain things other than numbers, but the basic idea is similar in the general case: the semantic function of a set is provided not by the fact that its members “form a collection” (whatever that might mean), but rather by the fact that there is a distinction of some kind (possibly of the kind that can be determined by an algorithm) between things that are in the set and things that are not in the set. In general a set is “infinite” if for every number N, the set contains more than N members (i.e. there are more than N things that satisfy the condition that the set encodes).

So that’s “infinity”, as used in standard mathematical practice. (Well, there’s also a notion of “infinity” in real analysis which essentially is just a placeholder symbol for “a really large number”, but when people talk about the philosophical issues behind infinity it is usually about the definition I just gave above, not the one in real analysis, which is not controversial.) Now, why is this at all controversial? Well, note that to define it, I had to talk about the notion of distinctions-in-general, as opposed to any individual distinction. But is it really coherent to talk about a notion of distinctions-in-general? Can it be made mathematically precise? This is really what the philosophical arguments are all about: what kinds of things are allowed to count as distinctions. The constructivists take the point of view that the only things that should be allowed to count as distinctions are those that can be computed by algorithms. There are some bullets to bite if you take this point of view though. For example, the twin prime conjecture states that for every number N, there exists p > N such that both p and p+2 are prime. Presumably this is either true or false, even if nobody can prove it. Moreover, presumably each number N either is or is not a counterexample to the conjecture. But then it would seem that it is possible to draw a distinction between those N which satisfy the conclusion of the conjecture, and those which are counterexamples. Yet this is false according to the constructive point of view, since there is no algorithm to determine whether any given N satisfies the conclusion of the conjecture.

I guess this is probably long enough already given that I’m replying to a five-year-old post… I could say more on this topic if people are interested.

• Why do you think that the axiomatic formulation of ZFC “should have meant an end” to the stance that ZFC makes claims that are epistemologically indefensible? Just because I can formalize a statement does not make that statement true, even if it is consistent. Many people (including me and apparently Eliezer, though I would guess that my views are different from his) do not think that the axioms of ZFC are self-evident truths.

In general, I find the argument for Platonism/​the validity of ZFC based on common acceptance to be problematic because I just don’t think that most people think about these issues seriously. It is a consensus of convenience and inertia. Also, many mathematicians are not Platonists at all but rather formalists—and constructivism is closer to formalism than Platonism is.

• I think my original sentence is correct; there is no known algorithm that provably outputs the answer to the question “Does N satisfy the conclusion of the conjecture?” given N as an input. To do this, an algorithm would need to do both of the following: output “Yes” if and only if N satisfies the conclusion, and output “No” if and only if N does not satisfy the conclusion. There are known algorithms that do the first but not the second (unless the twin prime conjecture happens to be true).

• So let me see if I’ve got this straight.

Computer Scientists: For some problems, there are random algorithms with the property that they succeed with high probability for any possible input. No deterministic algorithm for these problems has this property. Therefore, random algorithms are superior.

Eliezer: But if we knew the probability distribution over possible inputs, we could create a deterministic algorithm with the property.

Computer Scientists: But we do not know the probability distribution over possible inputs.

Eliezer: Never say “I don’t know”! If you are in a state of ignorance, use an ignorance prior.

Now of course the key question is what sort of ignorance prior we should use. In Jaynes’s book, usually ignorance priors with some sort of nice invariance properties are used, which makes calculations simpler. For example if the input is a bit stream then we could assume that the bits are independent coinflips. However, in real life this does not correspond to a state of ignorance, but rather to a state of knowledge where we know that the bit stream does not contain predictable correlations. For example, the probability of 1000 zeros in a row according to this ignorance prior is 10^{-300}, which is not even remotely close to the intuitive probability.

The next step is to try to create an ignorance prior which somehow formalizes Occam’s razor. As anyone familiar with MIRI’s work on the problem of logical priors should know, this is more difficult than it sounds. Essentially, the best solution so far (see “Logical Induction”, Garrabrant et al. 2016) is to create a list of all of the ways the environment could contain predictable correlations (or in the language of this post, resemble an “adversarial telepath”), and then trade them off of each other to get a final probability. One of the main downsides of the algorithm is that it is not possible to list all of the possible ways the environment could be correlated (since there are infinitely many), so you have to limit yourself to taking a feasible sample.

Now, it is worth noting that the above paper is concerned with an “environment” which is really just the truth-values of mathematical statements! It is hard to see how this environment resembles any sort of “adversarial telepath”. But if we want to maintain the ethos of this post, it seems that we are forced to this conclusion. Otherwise, an environment with logical uncertainty could constitute a counterexample to the claim that randomness never helps.

To be precise, let f be a mathematical formula with one free variable representing an integer, and suppose we are given access to an oracle which can tell us the truth-values of the statements f(1),...,f(N). The problem is to compute (up to a fixed accuracy) the proportion of statements which are true, with the restriction that we can only make n queries, where n << N. Monte Carlo methods succeed with failure probability exponential in n, regardless of what f is.

Now suppose that f is determined by choosing m bits randomly, where m << n, and interpreting them as a mathematical formula (throwing out the results and trying again if it is impossible to do so). Then if the minimum failure probability is nonzero, it is exponential in m, not n, and therefore bigger than the failure probability for Monte Carlo methods. However, any algorithm which can be encoded in less than ~m bits fails with nonzero probability for diagonalization reasons.

In fact, the diagonalization failure is one of the least important ones, the main point is that you just don’t know enough about the environment to justify writing any particular algorithm. Any deterministically chosen sequence has a good chance of being correlated with the environment, just because the environment is math and math things are often correlated. Now, we can call this an “adversarial telepath” if we want, but it seems to occur often enough in real life that this designation hardly seems to “carve reality at its joints”.

TL;DR: If even math can be called an “adversarial telepath”, the term seems to have lost all its meaning.

• 26 Nov 2016 1:22 UTC
0 points

The investigation of the systems implied by a set of axioms also requires some assumptions. For example, one must assume that any axiom implies itself, i.e. P → P. Once this axiom is accepted, there are a great number of logical axioms which are equally plausible.

• I am going to take some license with your question because I think you are asking the wrong question. Arbitrary topological spaces and abstract continuity are rarely the right notions in real-world situations. Rather, uniform continuity on bounded sets usually better corresponds to the intuitive notion of “a small change in input produces a small change in output”.

Thus, suppose that is a complete separable metric space and that is uniformly continuous on bounded sets. Then we can show that there exists a function which is uniformly continuous on bounded sets but not a fiber of (i.e. there is no such that for all ). Indeed, consider two cases:

1. is uniformly discrete. Then every map from to is uniformly continuous, so we get a contradiction from cardinality considerations.

2. is not uniformly discrete. Then for each n, since f is uniformly continuous on it has a modulus of continuity on this set, i.e. a continuous increasing function such that for all . Since is not uniformly discrete, there is a function such that for infinitely many , there exist such that . (To construct it, take pairs with , extract a subsequence that behaves geometrically nicely, and then find a function such that for all in the subsequence.) Clearly, cannot be a fiber of .

• Ah, you’re right. The proof can be fixed by changing the division between the two cases. So here is the new proof, with more details added regarding the construction of :

1. is uniformly discrete for all . Then every map from to is uniformly continuous on bounded sets, so we get a contradiction from cardinality considerations.

2. is not uniformly discrete for some . Then for each , since f is uniformly continuous on it has a modulus of continuity on this set, i.e. a continuous increasing function such that for all . Since is not uniformly discrete, there exist such that and . We can extract a subsequence such that if and , then for all , and for all and , either (A) or (B) . By extracting a further subsequence we can assume that which of (A) or (B) holds depends only on and not on . By swapping and if necessary we can assume that case (A) always holds. Now let be an increasing continuous function such that for all . Finally, let . Then for all we have but . Clearly, cannot be a fiber of .

Regarding the appropriateness of metric spaces /​ uniform continuity rather than topological spaces /​ abstract continuity, here are some of the reasons behind my intuition here (developed working in mathematical analysis, specifically Diophantine approximation, and also constructive mathematics):

1. The obvious: metric spaces are explicitly meant to represent the intuitive notion of alikeness as a quantitative concept (i.e. distance), whereas topological spaces have no explicit notion of alikeness.

2. In computability theory, one is interested in the question of how to computationally represent a point or an approximation to a point in a space. The standard way to do this is via restricting to the class of complete separable metric spaces, fixing a countable dense sequence (assumed to be representative of the structure of the metric space), and defining a computational approximation to a point to be an expression of the form . Since and are integers this expression can be coded as finite data. One then defines a computational encoding of a point to be an infinite bitstream consisting of computational approximations that converge to the point.

In practical applications, in the end you will want everything to be computable. So it makes sense to work in a framework where there are natural notions of computability. I am not aware of any such notions for general topological spaces.

3. Regarding continuity vs uniform continuity in metric spaces, both are saying that if two points are close in the domain, their images are also close. But the latter gives you a straightforward estimate as to how close, whereas the former says that the degree of closeness may depend on one of the points. Now, there are good reasons to consider such dependence, since even natural functions on the real numbers (such as or ) have “singularities” where they are not uniformly continuous.

So the question is whether to modify the notion of uniform continuity to directly account for singularities, or to use the standard definition of continuity instead. But if one works with the standard definition, then most of the time one is really looking for ways to sneak back to uniform continuity, e.g. by using the fact that a continuous function on a compact set is uniformly continuous.

An intuitive way of thinking about the fact that a continuous function on a compact set is uniformly continuous is that the notion of compactness means that there are no singularities present “within the space”. For example, if we go back to the functions or , then the singularity of the first occurs at infinity, while the singularity of the latter occurs at . If we take a compact subset of the domain of either function, then what it really means is that we are avoiding the singularity.

By contrast, non-compactness should mean that there are singularities. In some cases like it is easy to identify what the singularities are. But if we are dealing with spaces that are not locally compact like or an infinite-dimensional Hilbert space, then it is not as clear what the singularities are, there is just a general sense that they are dispersed “throughout the space” (because the space is not not locally compact).

But you have to ask yourself, are these singularities real or just imagined? In many cases, imagined. For example, in the theory of Banach spaces continuous linear maps are always uniformly continuous.

What about a map that is not uniformly continuous, like the inversion map in infinite-dimensional Hilbert space? In this case, there is still a singularity—at -- and the definition of continuity needs to reflect that. But it doesn’t help to imagine all sorts of other singularities dispersed throughout the space, because that prevents you from making useful statements like: if are at least away from and , then , where is an absolute constant.

Now the example in the previous paragraph is an example of quantitative continuity, which is stronger than uniform continuity away from singularities. But the point is that it can be seen as an extension of uniform continuity away from singularities.

4. Maybe my last reason will be the most relevant from a naturalized agent perspective. The notion of uniform continuity is important because it introduces the modulus of continuity, which can be viewed as a measure of how continuous a function is. The restriction that an agent must be uniformly continuous can be then thought of in a quantitative sense, with “better” agents less having to follow this restriction. So a more powerful agent may have a looser (larger) modulus of continuity, because it can react more precisely to different possible inputs.

In this terminology, my proof can be thought of as giving an intuitive reason for why the agent cannot implement every possible policy: the agent has limited resources to distinguish different inputs, so it can only implement those policies that can be implemented with these limited resources.

The obvious followup question would be whether if you restrict your attention to the policies that the agent isn’t prevented from implementing due to its limited resources, then can it implement every possible policy? Or in other words, if you fix a modulus of continuity from the outset, can you include all functions with that modulus of continuity as fibers?

If you allow the every-policy function to have an arbitrary modulus of continuity unrelated to the modulus of continuity you are trying to imitate, then it is not hard to see that this is possible at least for some spaces. (By Arzela-Ascoli the space of functions with a fixed modulus of continuity is compact, so there exists a continuous surjection from to this space.) But this may require greatly increasing the resources that the agent must spend to differentiate inputs. On the other hand, requiring the exact same modulus of continuity seems like too rigid an assumption. So the right question is probably to ask how close can the modulus of continuity of the every-policy function be to the modulus it is trying to imitate.

For this kind of question it is probably better to work with a concrete example rather than trying to prove something in generality, so I will work with the Cantor space with the metric . Suppose we want to imitate all functions such that implies . (I know this is not quite the same as the original question, but I think it is close enough.) If then there are such functions. So if we have a single function that has all of them as fibers, then by the pigeonhole principle there is some ball of the form that contains two such fibers. But then if and are the two fibers, then there exists such that . It follows that if we want to choose such that implies (i.e. the analogue of the assumption on but with replaced by ) then we need .

In conclusion, the required accuracy of is doubly exponential with respect to the required accuracy of . Thus, it is not feasible to implement such a function.

• 1 May 2017 11:10 UTC
LW: 2 AF: 1
AF

I don’t know why my comment doesn’t have a reply button. Maybe it is related to the fact that my comment shows up as “deleted” when I am not logged in.

Sorry, I seem to be getting a little lazy with these proofs. Hopefully I haven’t missed anything this time.

New proof: … We can extract a subsequence such that if and , then for all , and for all and , either (A) and or (B) and . By extracting a further subsequence we can assume that which of (A) or (B) holds depends only on and not on . By swapping and if necessary we can assume that case (A) always holds.

Lemma: For each there is at most one such that .

Proof: Suppose and , with . Then , a contradiction.

It follows that by extracting a further subsequence we can assume that for all .

Now let be an increasing uniformly continuous function such that and for all . Finally, let . Then for all we have . On the other hand, for all we have , for we have , and for we have . Thus . Clearly, cannot be a fiber of . Moreover, since the functions and are both uniformly continuous, so is .

Regarding your responses to my points:

I guess I don’t disagree with what you write regarding my points 1 and 4.

It seems to be harder than expected to explain my intuitions regarding singularities in point 3. Basically, I think the reasons that abstract continuity came to be considered standard are mostly the fact that in concrete applications you have to worry about singularities, and this makes uniform continuity a little more technically annoying. But in the kind of problem we are considering here, it seems that continuity is really more annoying to deal with than uniform continuity, with little added benefit. I guess it also depends on what kinds of functions you expect to actually come up, which is a heuristic judgement. Anyway it might not be productive to continue this line of reasoning further as maybe our disagreements just come down to intuitions.

Regarding my point 2, I wasn’t very clear when I said that uniform continuity gives you an algorithm, what I meant was that if you have an algorithm for computing the images of points in the dense sequence and for computing the modulus of continuity function, then uniform continuity gives you an algorithm. The function would be the kind of thing I would handle with uniform continuity away from singularities (to fix a definition for this, let us say that you are uniformly continuous away from singularities if you are uniformly continuous on sets of the form , where is some set of singularities).

In your definition of , I think you mean to write max instead of min. But I see your point, though the example seems a little pathological to me.

Anyway, it seems that you agree that it makes sense to restrict to Polish spaces based on computability considerations, which is part of what I was trying to say in 2.

If you have a locally compact Polish space, then you can find a metric with respect to which the space is proper (i.e. bounded subsets are compact): let , where is compact. With respect to this metric, continuity is the same as uniform continuity on bounded sets, so my proof should work then.

Proposition: Let be a Polish space that is not locally compact. Then (with the compact-open topology) is not first countable.

Proof: Suppose otherwise. Then the function has a countable neighborhood basis of sets of the form where is compact and is open. Since is not locally compact, there exists a point such that is not compact for any . For each , we can choose . Let , and note that is compact. Then is a neighborhood of . But then for some . This contradicts the fact that , since we can find a bump function which is on but at .

It does still seem to me that most of the useful intuition comes from point 4 of my previous comment, though.

• 1 May 2017 11:15 UTC
LW: 2 AF: 1
AF

I will have to think more about the issue of continuity vs uniform continuity. I suppose my last remaining argument would be the fact that Bishop—Bridges’ classic book on constructive analysis uses uniform continuity on bounded sets rather than continuity, which suggests that it is probably better for constructive analysis at least. But maybe they did not analyze the issue carefully enough, or maybe the relevant issues here are for some reason different.

To fix the argument that every locally compact Polish space admits a proper metric, let be as before and let if and if . Next, let , where is a countable dense sequence. Then is continuous and everywhere finite. Moreover, if , then and thus is compact. It follows that the metric is proper.

Anyway I have fixed the typo in my previous post.

• First of all, it seems to me that “updateless CDT” and “updateless EDT” are the same for agents with access to their own internal states immediately prior to the decision theory computation: on an appropriate causal graph such internal states would be the only nodes with arrows leading to the nodes “output of decision theory”, so if their value is known, then severing those arrows does not affect the computation for updating on an observation of the value of the “output of decision theory” node. So the counterfactual and conditional probability distributions are the same, and thus CDT and EDT are the same.

Anyway, here is another example where CDT is superior to EDT, without obviously bad agent design. Suppose an agent needs to decide whether to undertake a mission that may either succeed or fail, with utilities:

Not trying: 0

Trying and failing: −1000

Trying and succeeding: 100

The agent initially believes that its probability of success is 50%, but it can perform an expensive computation (-10 utility) to update this probability to either 1% or 99%. In any decision theory, if the new probability is 1% then it will not try, and if the new probability is 99% then it will try. However, it also has to make the decision whether to perform the computation in the first place, and if not, whether to try anyway or not. Under CDT, the choice is easy: computation gives an expected utility of

0.5(0) + 0.5(0.99(100) − 0.01(1000)) − 10 = 34.5

trying without computation gives an expected utility of

0.5(100) − 0.5(1000) = −450

and not trying without computation gives a utility of 0, so the agent performs the computation.

Under EDT, the equilibrium solution cannot be to always perform the computation. Indeed, if it were so, then the expected utility for trying without computation would be

0.99(100) − 0.01(1000) = 89

while the other two expected utilities would be the same, so the agent would try without computation. (If the agent observes itself trying, it infers that it must have done so because it computed the probability as 99%, and thus the probability of success must be 99%.)

The actual equilibrium would likely be to usually run the computation, but sometimes try without computation. But it is irrelevant, the point is that EDT recommends something different than the correct CDT solution.

Note that the computation cost is in fact irrelevant to the example, I only introduced it to motivate that a decision needs to be made about whether to make the computation or not. In fact, EDT would recommend a non-optimal solution even if there were no cost associated to running the computation.

As before, the key feature of this example is that while computing expected utilities, the agent does not have access to information about its internal states prior to choosing its action: it must choose to either update its priors about them using information from its counterfactual action (EDT) or sever this causal connection before updating (CDT).

As far as I can tell, there is no analogous setup where EDT is preferable to CDT.

Note: The reason the idea of this example can’t be used in the setup of the OP is that in the OP, the inaccessible variable (the utility function) is actually used in the decision theory computation, whereas in my example the inaccessible variable (the probability of success) is inaccessible because it is hard to compute, so it wouldn’t make sense to use it in the computation.