an authoritative payoff matrix that X can’t safely calculate xerself.

Why not? Can’t the payoff matrix be “read off” from the “world program” (assuming X isn’t just ‘given’ the payoff matrix as an argument.)

Karma: 986

an authoritative payoff matrix that X can’t safely calculate xerself.

Why not? Can’t the payoff matrix be “read off” from the “world program” (assuming X isn’t just ‘given’ the payoff matrix as an argument.)

Actually, this is an open problem so far as I know: show that if X is a Naive Decision Theory agent as above, with some analyzable inference module like a halting oracle, then there exists an agent Y written so that X cooperates against Y in a Prisoner’s Dilemma while Y defects.

Let me just spell out to myself what would have to happen in this instance. For definiteness, let’s take the payoffs in prisoner’s dilemma to be $0 (CD), $1 (DD), $10 (CC) and $11 (DC).

Now, if X is going to co-operate and Y is going to defect then X is going to prove “If I co-operate then I get $0”. Therefore, in order to co-operate, X must also prove the spurious counterfactual “If I defect then I get $x” for some negative value of x.

But suppose I tweak the definition of the NDT agent so that whenever it can prove (1) “if output = a then utility >= u” and (2) “if output != a then utility ⇐ u” it will immediately output a. (And if several statements of the forms (1) and (2) have been proved then the agent searches for them in the order that they were proved) Note that our agent will quickly prove “if output = ‘defect’ then utility >= $1”. So if it ever managed to prove “if output = ‘co-operate’ then utility = $0″ it would defect right away.

Since I have tweaked the definition, this doesn’t address your ‘open problem’ (which I think is a very interesting one) but it does show that if we replace the NDT agent with something only slightly less naive, then the answer is that no such Y exists.

(We could replace Prisoner’s Dilemma with an alternative game where each player has a third option called “nuclear holocaust”, such that if either player opts for nuclear holocaust then both get (say) -$1, and ask the same question as in your note 2. Then even for the tweaked version of X it’s not clear that no such Y exists.)

**ETA:**I’m afraid my idea doesn’t work: The problem is that the agent will*also*quickly prove “if ‘co-operate’ then I receive at least $0.” So if it can prove the spurious counterfactual “if ‘defect’ then receive −1″ before proving the ‘real’ counterfactual “if ‘co-operate’ then receive 0” then it will co-operate.We could patch this up with a rule that said “if we deduce a contradiction from the assumption ‘output = a’ then immediately output a” which, if I remember rightly, is Nesov’s idea about “playing chicken with the inconsistency”. Then on deducing the spurious counterfactual “if ‘defect’ then receive −1” the agent would immediately defect, which could only happen if the agent itself were inconsistent. So if the agent is consistent, it will never deduce this spurious counterfactual. But of course, this is getting even further away from the original “NDT”.

[general comment on sequence, not this specific post.]

You have such a strong intuition that no configuration of classical point particles and forces can ever amount to conscious awareness, yet you don’t immediately generalize and say: ‘no universe capable of exhaustive description by mathematically precise laws can ever contain conscious awareness’. Why not? Surely whatever weird and wonderful elaboration of quantum theory you dream up, someone can ask the same old question: “why does this bit that you’ve conveniently labelled ‘consciousness’ actually

*have consciousness*?”So you want to identify ‘consciousness’ with something ontologically basic and unified, with well-defined properties (or else, to you, it doesn’t

*really*exist at all). Yet these very things would convince me that you*can’t possibly*have found consciousness given that, in reality, it has ragged, ill-defined edges in time, space, even introspective content.Stepping back a little, it strikes me that the whole concept of subjective experience has been carefully refined so that it can’t possibly be tracked down to anything ‘out there’ in the world. Kant and Wittgenstein (among others) saw this very clearly. There are many possible conclusions one might draw—Dennett despairs of philosophy and refuses to acknowledge ‘subjective experience’ at all—but I think people like Chalmers, Penrose and yourself are on a hopeless quest.

The comprehension axiom schema (or any other construction that can be used by a proof checker algorithm) isn’t enough to prove all the statements people consider to be inescapable consequences of second-order logic.

Indeed, since the second-order theory of the real numbers is categorical, and since it can express the continuum hypothesis, an oracle for second-order validity would tell us either that CH or ¬CH is ‘valid’.

(“Set theory in sheep’s clothing”.)

But the bigger problem is that we can’t say exactly what makes a “silly” counterfactual different from a “serious” one.

Would it be naive to hope for a criterion that roughly says: “A conditional P ⇒ Q is silly iff the ‘most economical’ way of proving it is to deduce it from ¬P or else from Q.” Something like: “there exists a proof of ¬P or of Q which is strictly shorter than the shortest proof of P ⇒ Q”?

A totally different approach starts with the fact that your ‘lemma 1’ could be proved without knowing anything about A. Perhaps this could be deemed a sufficient condition for a counterfactual to be serious. But I guess it’s not a necessary condition?

Suppose we had a model M that we thought described cannons and cannon balls. M consists of a set of mathematical assertions about cannons

In logic, the technical terms ‘theory’ and ‘model’ have rather precise meanings. If M is a collection of mathematical assertions then it’s a theory rather than a model.

formally independent of the mathematical system A in the sense that the addition of some axiom A0 implies Q, while the addition of its negation, ~A0, implies ~Q.

Here you need to specify that adding A0 or ~A0 doesn’t make the theory inconsistent, which is equivalent to just saying: “Neither Q nor ~Q can be deduced from A.”

Note: if by M you had actually meant a model, in the sense of model theory, then for every well-formed sentence s, either M satisfies s or M satisfies ~s. But then models are abstract mathematical objects (like ‘the integers’), and there’s usually no way to know which sentences a model satisfies.

Perhaps a slightly simpler way would be to ‘run all algorithms simultaneously’ such that each one is slowed down by a constant factor. (E.g. at time t = (2x + 1) * 2^n, we do step x of algorithm n.) When algorithms terminate, we check (still within the same “process” and hence slowed down by a factor of 2^n) whether a solution to the problem has been generated. If so, we return it and halt.

ETA: Ah, but the business of ‘switching processes’ is going to need more than constant time. So I guess it’s not immediately clear that this works.

I agree that definitions (and expansions of the language) can be useful or counterproductive, and hence are not immune from criticism. But still, I don’t think it makes sense to play the Bayesian game here and attach probabilities to different definitions/languages being correct. (Rather like how one can’t apply Bayesian reasoning in order to decide between ‘theory 1’ and ‘theory 2’ in my branching vs probability post.) Therefore, I don’t think it makes sense to calculate expected utilities by taking a weighted average over each of the possible stances one can take in the mind-body problem.

I don’t understand the question, but perhaps I can clarify a little:

I’m trying to say that (e.g.) analytic functionalism and (e.g.) property dualism are not like inconsistent statements in the same language, one of which might be confirmed or refuted if only we knew a little more, but instead like different choices of language, which alter the set of propositions that might be true or false.

It might very well be that the expanded language of property dualism doesn’t “do” anything, in the sense that it doesn’t help us make decisions.

Of course, we haven’t had any instances of jarring physical discontinuities

*not*being accompanied by ‘functional discontinuities’ (hopefully it’s clear what I mean).But the deeper point is that the whole presumption that we have ‘mental continuity’ (in a way that transcends functional organization) is an intuition founded on nothing.

(To be fair, even if we accept that these intuitions are indefensible, it’s remains to be explained where they come from. I don’t think it’s all that “bizarre”.)

Nice sarcasm. So it must be really easy for you to answer my question then: “How would you show that my suggestions are less likely?”

Right?

You really think there is logical certainty that uploading works in principle and your suggestions are exactly as likely as the suggestion ‘uploading doesn’t actually work’?

How would you show that my suggestions are less likely? The thing is, it’s not as though “nobody’s mind has annihilated” is data that we can work from. It’s impossible to have such data except in the first-person case, and even there it’s impossible to know that your mind didn’t annihilate last year and then recreate itself five seconds ago.

We’re predisposed to say that a jarring physical discontinuity (even if afterwards, we have an agent functionally equivalent to the original) is more likely to cause mind-annihilation than no such discontinuity, but this intuition seems to be resting on nothing whatsoever.

The identify of an object is a choice, a way of looking at it. The “right” way of making this choice is the way that best achieves your values.

I think that’s really the central point. The metaphysical principles which either allow or deny the “intrinsic philosophical risk” mentioned in the OP are not like theorems or natural laws, which we might hope some day to corroborate or refute—they’re more like definitions that a person either adopts or does not.

I don’t see either as irrational

I have to part company here—I think it is irrational to attach ‘terminal value’ to your biological substrate (likewise paperclips), though it’s difficult to explain exactly why. Terminal values are inherently irrational, but valuing the continuance of your thought patterns is likely to be instrumentally rational for almost any set of terminal values, whereas placing extra value on your biological substrate seems like it could

*only*make sense as a terminal value (except in a highly artificial setting e.g. Dr Evil has vowed to do something evil unless you preserve your substrate).Of course this raises the question of why the deferred irrationality of preserving one’s thoughts in order to do X is better than the immediate irrationality of preserving one’s substrate for its own sake. At this point I don’t have an answer.

For any particular proposal for mind-uploading, there’s probably a significant risk that it doesn’t work, but I understand that to mean: there’s a risk that what it produces isn’t functionally equivalent to the person uploaded. Not “there’s a risk that when God/Ripley is watching everyone’s viewscreens from the control room, she sees that uploaded person’s thoughts are on a

*different screen*from the original.”

If the rules of this game allow one side to introduce a “small intrinsic philosophical risk” attached to mind-uploading, even though it’s impossible in principle to detect whether someone has suffered ‘arbitrary Searlean mind-annihiliation’, then surely the other side can postulate a risk of arbitrary mind-annihilation

*unless*we upload ourselves. (Even ignoring the familiar non-Searlean mind-annihilation that awaits us in old age.)Perhaps a newborn mind has a half-life of only three hours before spontaneously and undetectably annihilating itself.

Excellent.

Perhaps m could serve as a ‘location’, so that you’d be more likely to meet opponents with similar m values to your own.

Thanks, this is all fascinating stuff.

One small suggestion: if you wanted to, there are ways you could eliminate the phenomenon of ‘last round defection’. One idea would be to randomly generate the number of rounds according to an exponential distribution. This is equivalent to having, on each round, a small constant probability that this is the last round. To be honest though, the ‘last round’ phenomenon makes things more rather than less interesting.

Other ways to spice things up would be: to cause players to make mistakes with small probability (say a 1% chance of defecting when you try to co-operate, and vice versa); or have some probability of misremembering the past.

Conversely, when we got trolled an unspecified length of time ago, an incompetent crackpot troll who shall remain nameless kept having all his posts and comments upvoted by other trolls.

It would help if there was a restriction on how much karma one could add or subtract from a single person in a given time, as others are suggesting.

What interests me about the Boltzmann brain (this is a bit of a tangent) is that it sharply poses the question of where the boundary of a subjective state lies. It doesn’t seem that there’s any part X of your mental state that couldn’t be replaced by a mere “impression of X”. E.g. an impression of having been to a party yesterday rather than a memory of the party. Or an impression that one is aware of two differently-coloured patches rather than the patches themselves together with their colours. Or an impression of ‘difference’ rather than an impression of differently coloured patches.

If we imagine “you” to be a circle drawn with magic marker around a bunch of miscellaneous odds and ends (ideas, memories etc. but perhaps also bits of the ‘outside world’, like the tattoos on the guy in Memento) then there seems to be no limit to how small we can draw the circle—how much of your mental state can be regarded as ‘external’. But if only the ‘interior’ of the circle needs to be instantiated in order to have a copy of ‘you’, it seems like anything, no matter how random, can be regarded as a “Boltzmann brain”.

You should be able to get it as a corollary of the lemma that given two disjoint convex subsets U and V of R^n (which are non-zero distance apart), there exists an affine function f on R^n such that f(u) > 0 for all u in V and f(v) < 0 for all v in V.

Our two convex sets being (1) the image of the simplex under the F_i : i = 1 … n and (2) the “negative quadrant” of R^n (i.e. the set of points all of whose co-ordinates are non-positive.)