# What a reduction of “could” could look like

By requests from Blueberry and jimrandomh, here’s an expanded repost of my comment which was itself a repost of my email sent to decision-theory-workshop.

*(Wait, I gotta take a breath now.)*

A note on credit: I can only claim priority for the specific formalization offered here, which builds on Vladimir Nesov’s idea of “ambient control”, which builds on Wei Dai’s idea of UDT, which builds on Eliezer’s idea of TDT. I really, really hope to not offend anyone.

*(Whew!)*

Imagine a purely deterministic world containing a purely deterministic agent. To make it more precise, agent() is a Python function that returns an integer encoding an action, and world() is a Python function that calls agent() and returns the resulting utility value. The source code of both world() and agent() is accessible to agent(), so there’s absolutely no uncertainty involved anywhere. Now we want to write an implementation of agent() that would “force” world() to return as high a value as possible, for a variety of different worlds and without foreknowledge of what world() looks like. So this framing of decision theory makes a subprogram try to “control” the output of a bigger program it’s embedded in.

For example, here’s Newcomb’s Problem:

`def world():`

` box1 = 1000`

` box2 = (agent() == 2) ? 0 : 1000000`

` return box2 + ((agent() == 2) ? box1 : 0)`

A possible algorithm for agent() may go as follows. Look for machine-checkable mathematical proofs, up to a specified max length, of theorems of the form “agent()==A implies world()==U” for varying values of A and U. Then, after searching for some time, take the biggest found value of U and return the corresponding A. For example, in Newcomb’s Problem above there are easy theorems, derivable even without looking at the source code of agent(), that agent()==2 implies world()==1000 and agent()==1 implies world()==1000000.

The reason this algorithm works is very weird, so you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is obviously logically contradictory for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent *makes its premise true* by outputting A. In other words, an agent implemented like that cannot derive a contradiction from the logically inconsistent premises it uses, because then it would “imagine” it could obtain arbitrarily high utility (a contradiction implies anything, including that), therefore the agent would output the corresponding action, which would prove the Peano axioms inconsistent or something.

To recap: the above describes a perfectly deterministic algorithm, implementable today in any ordinary programming language, that “inspects” an unfamiliar world(), “imagines” itself returning different answers, “chooses” the best one according to projected consequences, and cannot ever “notice” that the other “possible” choices are logically inconsistent with determinism. Even though the other choices are in fact inconsistent, and the agent has absolutely perfect “knowledge” of itself and the world, and as much CPU time as it wants. (All scare quotes are intentional.)

This is progress. We started out with deterministic programs and ended up with a workable concept of “could”.

Hopefully, results in this vein may someday remove the need for separate theories of counterfactual reasoning based on modal logics or something. This particular result only demystifies counterfactuals about yourself, not counterfactuals in general: for example, if agent A tries to reason about agent B in the same way, it will fail miserably. But maybe the approach can be followed further.

- A model of UDT with a halting oracle by 18 Dec 2011 14:18 UTC; 66 points) (
- Another attempt to explain UDT by 14 Nov 2010 16:52 UTC; 52 points) (
- How some algorithms feel from inside by 17 May 2011 11:26 UTC; 48 points) (
- Example decision theory problem: “Agent simulates predictor” by 19 May 2011 15:16 UTC; 37 points) (
- Recommended Reading for Friendly AI Research by 9 Oct 2010 13:46 UTC; 35 points) (
- Formulas of arithmetic that behave like decision agents by 3 Feb 2012 2:58 UTC; 35 points) (
- Controlling Constant Programs by 5 Sep 2010 13:45 UTC; 34 points) (
- An example of self-fulfilling spurious proofs in UDT by 25 Mar 2012 11:47 UTC; 33 points) (
- Notes on logical priors from the MIRI workshop by 15 Sep 2013 22:43 UTC; 30 points) (
- Predictability of Decisions and the Diagonal Method by 9 Mar 2012 23:53 UTC; 29 points) (
- A model of UDT without proof limits by 20 Mar 2012 19:41 UTC; 21 points) (
- Imagine a world where minds run on physics by 31 Oct 2010 19:09 UTC; 17 points) (
- 23 Jul 2018 8:44 UTC; 16 points) 's comment on Let’s Discuss Functional Decision Theory by (
- Logical Counterfactuals & the Cooperation Game by 14 Aug 2018 14:00 UTC; 16 points) (
- 28 Mar 2012 21:21 UTC; 11 points) 's comment on Common mistakes people make when thinking about decision theory by (
- 12 Aug 2010 20:58 UTC; 6 points) 's comment on Should I believe what the SIAI claims? by (
- 29 Jan 2012 23:35 UTC; 5 points) 's comment on Rational vs. real utilities in the cousin_it-Nesov model of UDT by (
- 16 Sep 2014 16:18 UTC; 4 points) 's comment on Causal decision theory is unsatisfactory by (
- 11 Jan 2012 18:20 UTC; 4 points) 's comment on A model of UDT with a halting oracle by (
- 2 Apr 2014 21:03 UTC; 4 points) 's comment on Explanations for Less Wrong articles that you didn’t understand by (
- 15 Aug 2018 8:00 UTC; 4 points) 's comment on Logical Counterfactuals & the Cooperation Game by (
- 15 Aug 2018 20:07 UTC; 4 points) 's comment on Logical Counterfactuals & the Cooperation Game by (
- 11 Jun 2013 1:04 UTC; 4 points) 's comment on Robust Cooperation in the Prisoner’s Dilemma by (
- 12 Dec 2010 17:16 UTC; 3 points) 's comment on If reductionism is the hammer, what nails are out there? by (
- 1 Oct 2010 5:27 UTC; 3 points) 's comment on Open Thread September, Part 3 by (
- 28 Sep 2010 12:14 UTC; 3 points) 's comment on Open Thread September, Part 3 by (
- 24 May 2012 9:17 UTC; 2 points) 's comment on Problematic Problems for TDT by (
- 16 Aug 2010 20:05 UTC; 2 points) 's comment on Newcomb’s Problem: A problem for Causal Decision Theories by (
- 17 Aug 2010 23:27 UTC; 2 points) 's comment on Desirable Dispositions and Rational Actions by (
- 22 Mar 2012 15:57 UTC; 1 point) 's comment on A model of UDT without proof limits by (
- 26 Mar 2012 4:41 UTC; 0 points) 's comment on Decision Theories: A Semi-Formal Analysis, Part I by (
- 8 Mar 2013 21:44 UTC; 0 points) 's comment on A problem with “playing chicken with the universe” as an approach to UDT by (
- 19 Dec 2011 1:17 UTC; 0 points) 's comment on A model of UDT with a halting oracle by (
- 24 Jun 2011 21:18 UTC; 0 points) 's comment on Rationality Quotes: June 2011 by (
- 2 Apr 2014 16:34 UTC; 0 points) 's comment on Explanations for Less Wrong articles that you didn’t understand by (
- 3 Apr 2011 18:53 UTC; 0 points) 's comment on Where does uncertainty come from? by (
- 12 Aug 2010 17:48 UTC; 0 points) 's comment on Two straw men fighting by (
- 1 Jun 2011 18:22 UTC; 0 points) 's comment on What is/are the definition(s) of “Should”? by (
- 15 Nov 2010 0:30 UTC; 0 points) 's comment on Another attempt to explain UDT by (

A subtlety: If the proof search finds “agent()==1 implies world()==5” and “agent()==2 implies world()==1000″, then the argument in the post shows that it will indeed turn out that agent()==2 and world()==1000. But the argument does

notshow in any similar sense that ‘if the agent had returned 1, then world() would equal 5’—in fact this has no clear meaning (other than the one used inside the algorithm, i.e. that the theorem above is found in the proof search).I do not think that this is actually a problem here, i.e. I don’t think it’s possible to construct a world() where we would intuitively say that the algorithm doesn’t optimize [except of course by making the world so complicated that the proofs get too long], but there’s a related situation where something similar

doesplay a role:Suppose that instead of searching only proofs, you let agent() call a function probability(logicalStatement), and then do expected utility maximization. [To be precise, let both the possible actions and the possible utility values come from a finite set; then you can call probability(...) for all possible combinations, and since different utilities for the same action are mutually exclusive, their probabilities are additive, so the expected utility calculation is straight-forward.]

You might imagine that you could score different candidates for probability(...) on how much probability they assign to false statements (at least I at one point thought this might work), but this is not sufficient: an evil probability function might assign 99% probability to “agent()==1 implies world()==1” and to “agent()==2 implies world()==5″, even though

def world(): return 1000 if agent()==1 else 5

because both logical statements would turn out to be perfectly true.

Sorry, my previous reply was based on a misparsing of your comment, so I deleted it.

So the “mathematical intuition module” can be 100% truthful and evil at the same time, because making

truestatements isn’t the same as makingprovablestatements. Funny. It seems the truth vs provability distinction is actually the heart of decision theory.I came across this paragraph from Bruno Marchal today, which strikingly reminds me of Vinge’s “Hexapodia as the key insight”:

Bruno is a prolific participant on the “everything-list” that I created years ago, but I’ve never been able to understand much of what he talked about. Now I wonder if I should have made a greater effort to learn mathematical logic. Do his writings make sense to you?

No, they look like madness.

But logic is still worth learning.

EDIT: it seems I was partly mistaken. What I could parse of Marchal’s mathy reasoning was both plausible and interesting, but parsing is quite hard work because he expresses his ideas in a very freaky manner. And there still remain many utterly unparseable philosophical bits.

Thanks for taking another look. BTW, I suggest if people change the content of their comments substantially, it’s better to make a reply to yourself, otherwise those who read LW by scanning new comments will miss the new content.

This is basically the idea of the “mathematical intuition module”. Wei Dai explained it to me and I spent some time thinking about it, and now I think such an idea cannot work. To make it provably work, you’d need way too many assumptions about “well-behavedness” of the module that our actual math intuition doesn’t satisfy.

Note that by proving “agent()==1 implies world()==1000000”, and then performing action 1, you essentially conclude that “world()==1000000″ was

validin the first place. “You had the mojo all along”, as Aaronson said (on Löb’s theorem). Decision making is all about lack of logical transparency, inability to see the whole theory from its axioms.Woah, I just noticed the weird alternative proof of Löb’s theorem contained in that PDF. Can’t even tell if it’s correct. Thanks!

I was confused on the first reading: I thought you were presenting a contrasting reduction of couldness. Then, on reading the earlier comment, I saw the context in which you were presenting this: as a way to clarify the existing free will solution EY gave. Now, it makes more sense.

(It might be helpful to include more of the motivation for this explanation at the start.)

It’s true that I agree with Eliezer’s solution and my post can be seen as a formalization of that, but I’d much rather not discuss free will here and now :-)

It is a formalization of UDT, but I don’t see how it’s a formalization of TDT.

I think Silas meant Eliezer’s “solution to free will”, not TDT.

Posting to confirm. This is why I said:

I believe “solution to free will” was based on TDT (although if you make TDT informal, and then formalize the result, it can well be no longer a formalization of TDT, and work better too).

This has been extremely useful for me, even though I understood the basic idea already, because with the discussion it made me understand something very basic that somehow I’d always missed so far: I’d always thought of this kind of program in the context of applying classical decision theory with the possible actions being particular agent()s. But using classical decision theory already presupposes some notion of “could.” Now I understand that the principle here yields (or is a real step towards) a notion of could-ness even in the real world: If you know the formula that computes/approximates the universe (assuming there is such) and can state a utility function over results of this program, you can program a computer with something like your program, which gives a notion of “could.”

Thank youfor helping me fill my stupid gap in understanding!Side note: the intimate way that your notion of “could” is bound up with the program you’re running makes it nontrivial (or at any rate non-obvious to me) how to compare different possible programs. I don’t immediately see a direct equivalent to how, given a decision-theoretic problem, you can compare different efficient heuristic programs on how close they come to the theoretical optimum.

Good question, but surprisingly hard to parse. Took me three attempts.

Basically, you point out that my implementation of agent() quantifies over possible choices, but sometimes it’s useful for agents to quantify over

programsthat lead to choices. Maybe this falls magically out of the proof checker, maybe not. I cannot answer this. My brain hurts.Ouch, sorry! Too much time spent thinking about these things by myself without having to explain what I mean, probably. I think I still haven’t gotten across the simple point I was trying to make; let me try again.

Given an optimization problem like the TSP, there is a well-defined notion of “what utility/score do I get if I choose to do X,” independent of the algorithm you use to compute a solution—so there is a theoretical optimum solution independent of the algorithm you use to make a choice, and you can compare different heuristic algorithms on how close they come to the theoretical optimum. It seems useful to have this sort of description of the problem when trying to find actual practical algorithms.

I used to hope that we could find an equivalent for timeless decision theories, an abstract description of the optimum solution that we could then try to approximate through heuristic algorithms, but in your work, the definition of “what would happen if I did X” is so intimately connected to the algorithm making the decision that it’s at least not obvious to me how this could work.

Most (all?) proofs of statements like “if I did X, utility would be U” don’t actually need to look at the implementation of the agent, only find the call sites in world(). So we can compare different heuristics without things blowing up. Or did you mean something else?

Do you have an example of a decision theory problem where the optimal solution is hard to find, but heuristics help? Or should I use the TSP as an example?

(Sorry for the delayed reply.) I’m a little confused: it’s true that we can easily compare heuristics for world()s that depend on the agent only through easily identifiable call sites, but if we’re only considering this class of problems, what does your algorithm add over the simpler algorithm of replacing all call sites to agent() by a constant, running the result, and choosing the constant that gives the highest value?

(I suppose I was hoping for more direct game theoretical applications than you, so I had situations in mind where the (rest of) the world

doesdepend on the source of the agent.)Regarding examples, what I had in mind was: Surely your literal algorithm of enumerating all proofs is not efficient enough to

runeven on toy problems like your implementation of Newcomb! Also, surely for any real-world problem, it won’t be feasible to find the optimal solution. So at some point, we will want to use the theory as a background for looking for practical (and thus necessarily heuristic) algorithms.Admittedly, not much. If world() calls agent2() that provably returns the same value as agent(), agent() will notice it. But this wasn’t really the point of the post. The point was to show that having perfect knowledge of yourself doesn’t necessarily stop you from considering counterfactuals about yourself. Of course it’s all completely impractical.

Perhaps a closer approximation to reality would be a “listener” algorithm that

acceptsproofs (e.g. by reading game theory books), instead of trying to find them.It’s true that agent() cannot see where world() ends and agent() begins, but

wecan see that! Otherwise it would be unclear what a “decision-theoretic problem” even means. So we can just compare the return values of world() for different implementations of agent(). Or have I mis-parsed your question?Note that world() essentially doesn’t talk about

what can happen, instead it talks abouthow to compute utility, and computation of utility is a single fixed program without parameters (not even depending on the agent), that the agent “controls” from the inside.To clarify, in the Newcomb example, the preference (world) program could be:

while the agent itself knows its code in the form agent3(). If it can prove both agent1() and agent2() to be equivalent to agent3(), it can decide in exactly the same way, but now the preference program doesn’t contain the agent even once, there is no explicit dependency of the world program (utility of the outcome) on the agent’s actions. Any dependency is logically inferred.

And we now have a prototype of the notion of preference: a fixed program that computes utility.Yep—thanks. This is an important idea that I didn’t want to burden the post with. It would have brought the argument closer to UDT or ADT research, and I’d much prefer if you and Wei Dai wrote posts on these topics, not me. In matters of ethics and priority, one cannot be

toocareful. Besides, Iwantto read your posts because you have different perspectives from mine on these matters.Thanks, I think I’ll write up this notion of preference remark (a little more generally, as a theory not program, with program as a special case, and lack of explicit dependence in the discussion of why the program/theory is “fixed”).

One possible problem: There is a difference between a program and the function it computes. The notion of preference is perhaps captured best by a function, not a program. Many programs could be written, all computing the same function. This wouldn’t matter much if we were only going to call or invoke the program, since all versions of the program compute the same result. But, this is not what cousin_it suggests here. He wants us to examine the source of the program, to prove theorems about the program. Given finite resources, the things we can prove about one program may not match the things we can prove about another program,

even if the two programs would compute the same result in all cases if agent() returned the same result.No, a program. What the program defines is a single constant value, not a function (remember: if we are talking about a program, it’s a constant program, taking no parameters!). And of course it matters

howthat constant is defined, and not at all what that constant is. More generally, we can define that constant not by a program, but by a logical theory, which will be the topic of my next post.By a “logical theory” do you mean what logicians usually mean by a “theory”? A deductively closed set of sentences?

Wow! Well, that eliminates a lot of the arbitrary character I was objecting to in using

programsto represent the world/decision problem. But there still are a lot of deductive systems to choose among. I await your next post with interest.I won’t settle the choice, only point out the generality of notion and how it applies, direction to look for further refinements.

Yep, in my original email world() was called utility().

OK, so just spelling out the argument for my own amusement:

The agent can prove the implications “agent() == 1 implies world() == 1000000” and “agent() == 2 implies world() == 1000″, both of which are true.

Let x and y in {1, 2} be such that agent() == x and agent() != y.

Let Ux be the greatest value of U for which the agent proves “agent() == x implies world() == U”. Let Uy be the greatest value of U for which the agent proves “agent() == y implies world() == U”.

Then Ux is greater than or equal to Uy.

If the agent’s formalised reasoning is consistent then since agent() == x is true, Ux must be the

onlyU such that “agent() == x implies world() == U” is provable.Now, if there is more than one U such that “agent() == y implies world() == U” is provable then “agent() == y implies contradiction” is provable, and hence so is “agent() == y implies world() == U” for some value of U larger than Ux. Hence Uy would be larger than Ux, which is impossible.

Therefore, the implications “agent() == 1 implies world() == 1000000” and “agent() == 2 implies world() == 1000″ are the only ones provable.

Therefore, agent() == 1.

Thanks for this comment! The idea that only a single Uy can be provable, otherwise we get a contradiction and hence we can prove an arbitrarily high utility has greatly clarified this post for me. I still haven’t quite figured out why the agent can’t use the above proof that those are the only two provable and the fact that agent()==1 gives a higher value to prove that agent()!=2 and then create a contradiction.

Edit:Actually, I think I now understand this. After finding those two proofs, the agent can only conclude that there aren’t any more proofs of the form “agent()=_ implies world()=_”if its reasoning (say PA) is consistent. But it can’t actually prove this!Yes, this looks right.

Bravo!

The setup can be translated into a purely logical framework. There would be constants A (Agent) and W (World), satisfying the axioms:

(Ax1) A=1 OR A=2

(Ax2) (A=1 AND W=1000) OR (A=2 AND W=1000000)

(Ax3) forall a1,a2,w1,w2 ((A=a1 ⇒ W=w1) AND (A=a2 ⇒ W=w2) AND (w1>w2)) ⇒ NOT (A=a2)

Then the Agent’s algorithm would be equivalent to a step-by-step deduction:

(1) Ax2 |- A=1 ⇒ W=1000

(2) Ax2 |- A=2 ⇒ W=1000000

(3) Ax3 + (1) + (2) |- NOT (A=1)

(4) Ax1 + (3) |- A=2

In this form it is clear that there are no logical contradictions at any time. The system never believes A=1, it only believes (A=1 ⇒ W=1000), which is true.

In the purely logical framework, the word “could” is modeled by what Hofstadter calls the “Fantasy Rule” of propositional calculus.

The direction of setting the problem up in logical setting is further pursued here and here.

The axioms as you wrote them don’t fit or work for multiple reasons. Conceptually, the problem is in figuring out how decisions influence outcomes, so having something like (Ax2) defies this purpose. Statements like that are supposed to be inferred by the agent from more opaque-to-the-purpose-of-deciding definitions of A and W, and finding the statements of suitable form constitutes most of the activity in making a decision.

If the agent can figure out what it decides, then by default it becomes able to come up with spurious arguments for any decision, which makes its inference system (axioms) either inconsistent or unpredictable (if we assume consistency and don’t let (Ax3) into it). The rules that say which arguments lead to which actions shouldn’t be part of the theory that the agent uses to come up with the arguments. This excludes your (Ax3), which can only be assumed “externally” to the agent, a way of seeing if it’s doing OK, not something it should itself be able to see.

A contradiction can be inferred from your axioms in this way: (4) says A=2, which implies [A=1 ⇒ W=0], which by (Ax3) together with [A=2 ⇒ W=1000000] implies that A=1, which is a contradiction and a decision to do whatever (I think you’ve also mixed up A=1 and A=2 while going from (Ax2) to (1) and (2)).

Thanks! This site is rather large :) And the most interesting (for me) stuff is rarely linked from the sequences. That is, it seems so… Is there a more efficient way to get up to date? [I would SO like to have found this site 5 years ago!]

The axioms were not proposed seriously as a complete solution, of course. The (Ax2) can be thought of as a cached consequence of a long chain of reasoning. The main point of the post was about the word “could”, after all.

(Ax3) is just an axiomatic restatement of the “utility maximization” goal. I don’t see why an agent can’t know about itself that it’s a utility maximizer.

Yes, A=1 and A=2 were mixed up in the (Ax2), fixed it now. But I don’t see your contradiction. (Ax3) + [A=1 ⇒ W=0] + [A=2 ⇒ W=1000000] does not imply A=1. It implies NOT(A=1)

See the posts linked to from UDT, ADT, TDT, section “Decision theory” in this post.

It can, it’s just a consideration it can’t use in figuring out how the outcome depends on its decision. While looking for what is influenced by a decision, the decision itself should remain unknown (given that the decision is determined by what this dependence turns out to be, it’s not exactly self-handicapping).

It seems the mixing up of A=1 and A=2 didn’t spare this argument, but it’s easy enough to fix. From A=2 we have NOT(A=1), so [A=1 ⇒ W=1,000,000,000] and together with [A=2 ⇒ W=1,000,000] by (Ax3) we have A=1, contradiction.

I thought about a different axiomatization, which would not have the same consistency problems. Not sure whether this is the right place, but:

Let X be a variable ranging over all possible agents.

(World Axioms)

forall X [Decides(X)=1 ⇒ Receives(X)=1000000]

forall X [Decides(X)=2 ⇒ Receives(X)=1000]

(Agent Axioms—utility maximization)

BestX = argmax{X} Receives(X)

Decision = Decides(BestX)

Then Decision=1 is easily provable, regardless of whether any specific BestX is known.

Does not seem to lead to contradictions.

Yes, I see now, thanks.

If I attempted to fix this, I would try to change (Ax3) to something like:

forall a1,a2,w1,w2 ((A=a1 ⇒ W=w1) AND (A=a2 ⇒ W=w2) AND (w1>w2)) ⇒ NOT (FinalA=a2)

where FinalA is the actual decision. But then, why should the world’s axiom (Ax2) be defined in terms of A and not FinalA? This seems conceptually wrong...

Ok, I’ll go read up on the posts :)

I enjoyed reading and thinking about this, but now I wonder if it’s less interesting than it first appears. Tell me if I’m missing the point:

Since world is a function depending on the integers agent outputs, and not on agent itself, and since agent knows world’s source code, agent could do the following: instead of looking for proofs shorter than some fixed length that world outputs u if agent outputs a, it could just simulate world and tabulate the values world(1), world(2), world(3),… up to some fixed value world(n). Then output the a < n + 1 for which world(a) is largest.

In other words, agent can just ask world what’s the best it can get, then do that.

Granted there are situations in which it takes longer to compute world(n) than to prove that world(n) takes a certain value, but the reverse is also true (and I think more likely: the situations I can think of in which cousin it’s algorithm beats mine, resource-wise, are contrived). But cousin it is considering an idealized problem in which resources don’t matter, anyway.

Edit: If I’m wrong about this, I think it will be because the fact that world depends only on agent’s output is somehow not obvious to agent. But I can’t figure out how to state this carefully.

You’re not completely wrong, but the devil is in the details. The world’s dependence on the agent’s output may indeed be non-obvious, as in Nesov’s comment.

Are you saying there’s a devil in my details that makes it wrong? I don’t think so. Can you tell me a tricky thing that world can do that makes my code for agent worse than yours?

About “not obvious how world depends on agent’s output,” here’s a only very slightly more tricky thing agent can do, that is still not as tricky as looking for all proofs of length less than n. It can write a program agent-1 (is this what Nesov is getting at?) that always outputs 1, then compute world(agent-1). It can next write a program agent-2 that always outputs 2, then compute world(agent-2). On and on up to agent-en. Then have agent output the k for which world(agent-k) is largest. Since world does not depend on any agent’s source code, if agent outputs k then world(agent) = world(agent-k).

Again this is just a careful way of saying “ask world what’s the best agent can get, and do that.”

Am I right that if world’s output depends on the length of time agent spends thinking, then this solution breaks?

Edit: I guess “time spent thinking” is not a function of “agent”, and so world(agent) cannot depend on time spent thinking. Wrong?

Edit 2: Wrong per cousin’s comment. World does not even depend on agent, only on agent’s output.

Yes. This solution is only “optimal” if world() depends only on the return value of agent(). If world() inspects the source code of agent(), or measures cycles, or anything like that, all bets are off—it becomes obvously impossible to write an agent() that works well for every possible world(), because world() could just special-case and penalize your solution.

Edit:You weren’t wrong! You identified an important issue that wasn’t clear enough in the post. This is the right kind of discussion we should be having: in what ways can we relax the restrictions on world() and still hope to have a general solution?One can interpret the phrase “world calls agent and returns utility” with different levels of obtuseness:

World looks up agent, examines it, runs it, sees what it’s output was and intermediate steps were, then decides what agent deserves.

World looks at a sheet of paper agent has written a number on. Analyzes handwriting. Then decides what agent deserves

World does not even analyze handwriting.

You mean 3, right? That’s all I meant by edit 2.

Yes, that’s right.

Neat. It seems like one could think of the agent as surveying mathematical objects (sets of theorems of the form “agent()==A implies world()==U”) and then forcing itself to be in a particular mathematical object (that which contains a particular true statement “agent()==A implies world()==U”). If we’re thinking in terms of an ensemble the agent would already be in a particular mathematical object, so perhaps it’s procedure could be thought of as forcing itself into a particular mathematical sub-structure.

When I think about this stuff I get the feeling I’m confused on many levels.

You were right: reading a second time helped. I figured out what is bothering me about your post. The above seems misspoken, and may lead to an overly constricted conception of what the agent “could” do.

The theorems proved use statements having false antecedents, but there are no

false premisesused by the agent. Any statement beginning “If agent==2 then...” is obviously true if the “if” is a simple material conditional.If the “if” is of the counterfactual variety, it can still be true. There’s nothing mysterious about counterfactuals: they follow trivially from physical laws. For example, if we have an ideal gas, it follows from PV=nRT that doubling the temperature with constant n and V would double the pressure. Supposing that our agent can be thoroughly described by deterministic laws doesn’t undermine counterfactuals—it grounds them.

To get from “if … would …” to “could...”, where the “could” signifies ability, I would argue, it is sufficient that rationality play a critical causal role. Roughly, if the only thing preventing the enactment of the sequence is the rationality of the agent itself, then the sequence is one of the things the agent could do. If outputting “1“ were optimal, the agent would do so; if “2” were optimal, the agent would do “2”; consideration of optimality or sub-optimality is a rational process; so those are both options for the agent.

There’s nothing wrong with arguing for a modest conclusion, focusing on one restricted case of what an agent could do. But there’s no reason to settle for that, either.

I don’t understand your comment at all. You seem to assume as “trivial” many things that I set out to demystify. Maybe you could formalize your argument?

I’m not sure if I understand torekp’s comment right, but it seems to me that the issue is the difference between “physical” and “logical” counterfactuals in the following sense:

For concreteness and simplicity, let’s suppose physics can be modeled by a function step(state) that takes a state of the world, and returns the state of the world one instant later. Then you can ask what

wouldhappen if the current state were slightly different (“suppose therewerean apple here in the air at this point in time; then itwouldfall down”). Call this a “physical” counterfactual.Before reading Eliezer’s arguments, I used to think this could be used to model decision-making: Let stateA be what the state of the world will

reallybe at the point I’ve made my decision. For every choice X I could make, let stateA_X be the same as stateA except that the future copy of me is replaced by a version that has decided to do X. Now it’s well-defined what the future of every stateA_X looks like, and if I have a well-defined utility function, then it’s well-defined which choice(s) this utility is maximized. Of course I cannotcalculatethis, but, I thought, the actual decision making process could be seen as an approximation of this.Except that this would lead to classical decision/game theory, with its unattractive consequences—the case where Omega has just made two identical physical copies of me, and has us play a true Prisoner’s Dilemma, etc. Since I can

provethat the other copy of me will do the same thing in every stateA_X (because I’m only replacing myself, not the other copy), it follows that I should defect, because it gives me the higher payoff no matter what the other copy of me does.Thus “physical” counterfactuals do not make for an acceptable decision theory, and we’re forced to look for some notion not of, “what would happen if the state of the world were different,” but of something like, “what would happen if this Turing machine would return a different result?” That would be a “logical” counterfactual. Of course, it’s not obvious what the hell this is supposed to

mean, and if it can be made to mean anything coherent—your post is an answer to that question.Benja’s got it: I’m interested in physical counterfactuals. They are the type that is involved in the everyday notion of what a person “could” do.

As for decision theory, I think that the “logical” counterfactuals should

supplement, notsupplant, the physical counterfactuals.I understand that many people share this opinion, but I don’t see much in the way of justification. What arguments do you have, besides intuition? I see no reason to expect the “correct” decision theory to be intuitive. Game theory, for example, isn’t.

“Logical” counterfactuals either mean those whose antecedents are logically impossible, or those whose consequents follow from the antecedents by logical necessity, or both. In your Newcomb’s-problem-solving algorithm example, both features apply.

But there are many decisions (or sub-problems within decisions) where neither feature applies. Where the agent is a human being without access to its “source code”, it commits no contradiction in supposing “if I did A… but if I did B …” even though at most one of these lies in the future that causality is heading toward. Additionally, one may be unable to

provethat the expected utility of A would be U1 and the expected utility of B would be U2, even though one may reasonably believe both of these. Even when we know most of the relevant physical laws, we lack knowledge of relevant initial or boundary conditions, and if we had those we’d still lack the time to do the calculations.Our knowledge of physical counterfactuals isn’t deductive, usually, but it’s still knowledge. And we still need to know the physical consequences of various actions—so we’ll have to go on using these counterfactuals. Of course, I just used another intuition there! I’m not concerned, though—decision theory and game theory will both be evaluated on a balance of intuitions. That, of course, does not mean that every intuitively appealing norm will be accepted. But those that are rejected will lose out to a

moreintuitively appealing (combination of) norm(s).In a previous comment, I said that physical counterfactuals follow trivially from physical laws, giving an ideal-gas example. But now I just said that our knowledge of physical counterfactuals is usually

non-deductive. Problem? No: in the previous comment I remarked on the relation betweentruthsin attempting to show that counterfactuals needn’t be metaphysical monsters. In this comment I have been focusing on the agent’sepistemicsituation. We can know to a moral certainty that a causal relationship holds without being able to state the laws involved and without being able to make any inference that strictly deserves the label “deduction”.I’m uncomfortable with the unbounded CPU time assumption. For example, what if

world()computes the payoff in a Prisoner’s Dilemma between two identicalagent()s? You get non-termination due to a recursion without an escape clause.The implementation of agent() given in the post is guaranteed to terminate. It enumerates proofs about itself and the world up to a specified max proof length, it doesn’t call itself recursively.

Ok, I had been mistakenly assuming that it accomplished the proofs by simulating

world(), which would necessarily involve recursive calls back toagent(), at least in a multi-player game.My bad. But I am still not convinced that

agent()is guaranteed to terminate. Suppose, for example, that it cannot findanyproofs of the form “agent()==A implies world()==U”?Also, in my example in which

world()is supposed to compute the results of a Prisoner’s Dilemma between two identical agents, is the “knowlege” that the two agents are indeed identical supposed to reside inworld()or inagent()? I suspect that the right way to extend this thought experiment to the multi-player game scenario is to keepworld()in the dark as to whetheragent1()andagent2()are identical algorithms. Don’t hard-code this fact into the agents, either. I think you have to let the two agents discover that they are identical to each other. The thing is, this “understanding” has to be achieved by proving things about proofs. So, I think that modal logic slips back in anyways.It’s true that the program might fail to find any proofs at all. Then it could output a random action, I guess? Or just return zero?

Puzzled about your last paragraph. The “knowledge” that the agents are equivalent has to be derivable by inspecting the program as a whole. (This is a

verystrong condition, but it can be satisfied.) Proving things about proofs is something proof checkers with sufficiently strong axioms can do just fine, no modal logic needed.But in general, I don’t believe this sort of reasoning carries over to multi-player games. If we take the Prisoner’s Dilemma and tweak the payoffs slightly to make them non-symmetric, the agents are back to square one. Game theory is inherently harder than individual rationality. I really should write another post on that, because it seems to be a Pons Asinorum that many intelligent people fail to cross.

Seems to me you understood it and responded appropriately.

Yes it is. One important difference is that game-playing agents generally use

mixedstrategies. So we need to distinguish the reading the strategy source code of another player from reading the pseudo-random-number source code and “seed”. Rational agents will want to keep secrets.A second difference is that the “preference structure” of the agent ought to be modeled as data about the agent, rather than data about the world. And some of this is information which a rational agent will wish to conceal as well.

So, as a toy model illustrating the compatibility of determinism and free will, your example (i.e. this posting) works fine. But as a way of modeling a flavor of rationality that “solves” one-shot PD and Newcomb-like problems in a “better” way than classical game theory, I think it doesn’t (yet?) work.

Slightly tangential: I have a semi-worked-out notion of a “fair” class of problems which includes Newcomb’s Problem because the predictor makes a decision based only on your action, not the “ritual of cognition” that preceded it (to borrow Eliezer’s phrase). But the one-shot PD doesn’t reside in this “fair” class because you don’t cooperate by just predicting the opponent’s action. You need to somehow tie the knot of infinite recursion, e.g. with quining or Löbian reasoning, and all such approaches seem to inevitably require inspecting the other guy’s “ritual of cognition”. This direction of research leads to all sorts of clever tricks, but ultimately it might be a dead end.

As for your main point, I completely agree. Any model that cannot accommodate limited information is bound to be useless in practice. I’m just trying to solve simple problems first, see what works and what doesn’t in very idealized conditions. Don’t worry, I’ll never neglect game theory—it was accidentally getting my hands on Ken Binmore’s “Fun and Games” that brought me back to math in the first place :-)

I don’t understand. Do you mean changing the numeric values, leaving the game the same, or changing the ordering, turning it into different game? In the first case, it seems to me that the argument from your other post should work just fine [

Edit: Wrong, see cousin_it’s explanation below] -- roughly:Suppose there’s a short proof that agent1()==”C” iff agent2()==”cooperate”. Then there’s an only slightly longer proof that agent1()==”C” implies that both agents get the C/C payoff, and that agent1()==”D” implies both get the D/D payoff. Therefore, under the supposition, agent1 will choose “C”; similarly, agent2 will choose “cooperate”.

In other words: If there’s a short proof that agent1()==”C” iff agent2()==”cooperate”, then agent1()==”C” and agent2()==”cooperate”. By the Löbian reasoning in your previous post (linked above), it follows that both agents cooperate.

Am I confused about something? Or is it just that you were in fact talking about changing the payoffs so that the game is not a PD any longer? (I think it still at least bears mentioning if your work already solves the class of problems that Eliezer has been talking about, “where the other agents’ decisions can be viewed as functions of one argument, that argument being your own choice in that particular case.”)

This post does solve the class of problems Eliezer was talking about (I call it the “fair” class), but my previous Löbian post used a

differentalgorithm that didn’t do utility maximization. That algorithm explicitly said that it cooperates if it can find any proof that agent1()==agent2(), without comparing utilities or trying to prove other statements, and that explicitness made the proof work in the first place.Your proof fails here:

It’s extremely difficult for the agent to conclude that it

willmake any particular choice, because any proof of that (starting from your assumptions or any other assumptions) must also prove that the agent won’t stumble on anyotherproofs that lead to yet higher outcomes. This amounts to the agent assuming the consistency of its own proof checker, which is a no go.As far as I can see right now, making the payoffs slightly non-symmetric breaks everything. If you think an individual utility-maximizing algorithm can work in general game-theoretic situations, you have a unique canonical solution for all equilibrium selection and bargaining problems with non-transferable utility. Judging from the literature, finding such a unique canonical solution is

extremelyunlikely. (The transferable utility case is easy—it is solved by the Shapley value and my Freaky Fairness algorithm.)When does the agent need to have that knowledge? It just acts on certain conclusions, at the moment it obtains them. It can as well act on conclusions1 about its future conclusions2, the only problem might be that it never reaches those conclusions1 (and so won’t act on them)

The point is that I relied on the fact that if the agent can prove “If I cooperate, I get 5” and “If I defect, I get 2″, then the agent will cooperate. This is a fact, but the agent cannot prove it: if its proof system were inconsistent, then it might (within maxsteps)

alsobe able to prove “If I defect, I get 42,” and thus end up defecting.Maybe your comment suggests that the agent should choose to cooperate as soon as it has proven the first two statements above; but as far as the agent knows, if its proof system is inconsistent, it might find the proof of “If I defect, I get 42” earlier than the proof of “If I defect, I get 2.” The agent cannot prove that “I will prove C → 5 and D → 2” implies “I will cooperate.”

The agent can just cooperate as soon as it proves “If I cooperate, I get X” and “If I defect, I get Y”, for X>Y. This algorithm is justified, and the agent doesn’t need to write its own algorithm,

weare writing it.I don’t completely understand your idea, but Löbian cooperation seems to require enumerating a lot of proofs without stopping early. If Bob can stop early, Alice now needs to prove that Bob will stumble on a specific proof earlier than that, rather than at all. I remember trying and failing to remove the exhaustive search for proofs from my previous post. Or maybe I’m missing something?

Maybe you could formalize your idea and give some sort of proof? If it works, it’ll be definite progress.

IIUC, Vladimir is arguing that we can write an algorithm specifically for the PD that will cooperate if it finds any two proofs “C → I get X” and “D → I get Y”, for X>Y. I think this algorithm will indeed do the right thing, do you agree?

Intuitively, the point is that both in Vladimir’s algorithm and your earlier proof, it is sufficient to know something of the form “there exists a proof such that...” to conclude that the program will have a certain behavior (cooperate), whereas in my purported proof, you need to know something of the form “there exists no proof such that...”, aka “for all proofs, it holds that...” (which you’d also need in both Vladimir’s proposal and your earlier proof to show that one of the agents

defects).Ah. Now I see where the actual work is being done. Vladimir, sorry, I retract my request for clarification.

The proposed algorithm will probably work for the usual Löbian reasons (don’t have the energy to write a formal proof right now, though). But it is

notutility maximization. This algorithm asymmetrically “privileges” cooperation over defection: finding a proof that cooperation is better will make it cooperate, but finding a proof that defection is betterwon’tmake it defect. So in spirit it’s pretty similar to my old algorithm, and we probably won’t find a solution to game theory on these lines.Conjecture:

If you just look for the proofs of statements of the form

where U1 and U2 are variables, and A1 and A2 are different constants (cooperate and defect in some order), up to some sufficient timeout (after which you act randomly), and perform A1 the moment you see the first one, you’ll play correctly against your near-copy (with different timeout or syntactic differences), and defecting and cooperating rocks.

Maybe my math intuition is utterly failing me today, but I find this conjecture very hard to believe. Could you try proving it?

Now

thatI disagree with! If the algorithm finds a proof that defection is better, itwilldefect, because the proof system is consistent, so the algorithm can’t in this case find a proof that cooperation is better. (It’s true that it cannotprovethat it will defect, of course.) In fact I’d say the algorithm privileges defection, because it’ll defect unless it finds a reason to cooperate.I don’t see any straight-forward generalization to arbitrary games, but a natural generalization to games where every player has only two choices springs to mind. Take as parameters the payoff matrix and the other players’ source code; then play minimax unless you can prove that playing the other strategy will result in a higher payoff. [

Edit: Er, unless the utility you can prove you get under the assumption that you play the other strategy is higher than etc etc… you know what I mean.] Haven’t yet looked into whether this actually does give sensible answers to non-PD games, though. *goes off to do just that*What on Earth do you mean by this counterfactual? :-)

The obvious extension of your proposed algorithm to games with many players and many strategies: assume any commonly-accepted method of finding a “fair strategy profile”, attempt Löbian cooperation to reach it, otherwise play minimax. This doesn’t solve fairness/bargaining, and we all are quite familiar with this sort of trick already.

Essentially, what you’re doing is “porting” old ideas from one enforcement method (quining) to another (Löbian cooperation), hoping that fairness will magically pop out this time. No. Fairness and enforcement are

orthogonalissues. I’m really frustrated that people stubbornly fail to see fairness astheserious obstacle to “solving” game theory. There’s no cheap accidental way around it.ETA: on rereading, this comment sounded harsh. Please don’t take it personally—I just feel that at this point in the discussion, it’s time to forcefully state some of my background intuitions so other people can break them down more easily.

It’s not a counterfactual, it’s a logical implication, so it’s true! :-) No, seriously, if it wasn’t clear, I mean that against every opponent program for which it can prove a higher utility for defection, it will indeed defect; e.g. against the always-cooperating program and the defection rock.

My intuition is that the algorithm above precisely does

notencode a notion of “fair strategy profile,” it just tries to maximize its own utility; I suppose that’s a fuzzy notion, though. I wasn’t arguing that it solves bargaining. An extension to many strategies should cooperate in the finitely iterated PD without wiring in any more notion about “fairness” than in the algorithm above, and without special-casing IPD—if that’s indeed simple I’m curious how it works, it’s not obvious to me.Ohhh, I managed to miss that. Thank you. This is really good.

About bargaining, I reason like this: if you have a program that’s good enough at maximizing utility that it can cooperate with itself in the PD, and make this program play different games against itself, this will give us a canonical way to calculate a “fair” outcome for every game. But this is very unlikely, so a utility-maximizer that plays arbitrary games well is by implication also very unlikely.

How do your programs play the iterated PD? Do they have memory? I can’t understand what you mean yet.

It seems to me that bargaining will be resolved simultaneously with the problem of deciding under uncertainty (when you can’t hope to find a proof of utility being precisely U).

On one hand, this sounds reasonable apriori.

On the other hand, making games “fuzzy” to solve bargaining has been tried, and it’s not enough.

On the third hand, I feel that some games might be genuinely

indeterminatebecause they abstract too much, they don’t include enough information from the real-world situation—information that in practice ends up determining the outcome. For example, (instantaneous) bargaining in the Rubinstein model depends on the players’ (temporal) discount rates, and if you forgot to look at them, the instantaneous game seems pretty damn indeterminate.My proposal can’t handle the iterated PD, only games where each player has two choices. I was replying to you saying there was an obvious generalization to more than two strategies—if there is, then we can pass in the payoff matrix of the normal form of a finitely iterated PD (and if it’s a sensible generalization, it should cooperate with itself on every round).

Your argument about bargaining makes sense, though alas I don’t know enough about bargaining to have a really informed opinion. It may be that the idea only works for sufficiently PD-like games, but if we can handle a class of them without special-casing that doesn’t seem so bad. It

doesat least handle Chicken [Edit: and Stag Hunt] correctly, AFAICS, so it’s not as if it’s PD-only.Um, the obvious generalization to many strategies must “privilege” one of the strategies apriori, the same way as your algorithm “privileges” cooperation. Otherwise, what single statement would the proof checker be trying to prove? I don’t see a way around that.

Ah, sorry, now I understand what’s going on. You are saying “there’s an obvious generalization, but then you’d have to pick a ‘fair’ strategy profile that it would privilege.” I’m saying “there’s no obvious generalization which preserves what’s interesting about the two-strategy case.” So we’re in agreement already.

(I’m not entirely without hope; I have a vague idea that we could order the possible somehow, and if we can prove a higher utility for strategy X than for any strategy that is below X in the ordering, then the agent can prove it will definitely choose X or a strategy that is above it in the ordering. Or something like that. But need to look at the details much more closely.)

I’m not sure when exactly you added this, is this still what you think I’m doing? What I’m trying to do is to give an algorithm that correctly handles games “where the other agents’ decisions can be viewed as functions of one argument, that argument being your own choice in that particular case” (same Eliezer quote as upthread),

withoutspecial-casing for the details of particular games like you did with your solution of the PD. I.e., I’m trying to do what I at first thought your original post did, but what you explained to me it doesn’t. AFAICS, if Eliezer’s informal arguments are correct, these games have a clear correct solution without solving bargaining.[The ETA’s duly noted :-)]

Can you describe this class of games precisely? And can you define the solution precisely, without referring to our algorithms?

I can describe

aclass of such games and its solution, but I’d hope that if a good decision theoretical agent exists that solves this class, it might also solve some wider class of problems in an intuitively correct way. -- That said, the class is symmetric games, and the solution is the strategy profile that yields the highest utility, among those strategy profiles in which all players choose the same strategy. [More precisely, a good algorithm should choose this algorithm when playing against itself or a similar algorithm, should maximize its utility when playing against an opponent that unconditionally plays some given strategy, and should “do something sensible” in other cases.]The quining-cooperator algorithm in the Prisoner’s Dilemma forms a “Nash equilibrium in algorithms” against itself. This is a very desirable property to have, which roughly corresponds to “reflective consistency” in single-player decision theory. However, the algorithm you’re trying to develop (which includes maximizing utility against an opponent that plays a strategy unconditionally) will

notform a “Nash equilibrium in algorithms” against itself, even in symmetric games.To see this, consider a simple bargaining game, a variant of “dividing the dollar”. Each player sees the other’s source code and outputs an integer. If the sum of the two integers is less than or equal to 10, both players get the amount in dollars that they asked for. If the sum is greater than 10, both get nothing. Your algorithm will output 5 when playing against itself. But this means it’s not a best reply to itself, because if your opponent is replaced with the simple program “return 9″, you will maximize utility and get only 1.

On reflection: I agree that “Nash equilibrium in algorithms” would be desirable, but you seem to find it a more knock-out argument than I do. If we’re discussing what kinds of algorithms to enter into something like Axelrod’s tournament, then clearly, Nash equilibrium in algorithms would be a

verycompelling property. But if world() really is (a utility function over)the whole universe, then it’s less clear to me that “take 1″ is not the right thing to do when encountering a “take 9” rock.Intuitively, the reason you wouldn’t want to “take 1” in this case is that you would not want someone else—Clippy, say—a motive to leave “take 9″ rocks around for you to find. But there’s the counter-intuition that:

(a) If Clippy does this as a result of reasoning about how you behave, then that’s in fact a

differentsituation (the assumption being that you’re reasoning about the source of the whole world, not just about the source of the “take 9” rock in front of you). What you do inthissituation influences whether Clippy will have left behind a “take 9″ rock, so your decision algorithm shouldnotbe able to conclude that you maximize utility by “taking 1.” [I’m assuming an UDT-like agent which one can think of as taking the source of the world and returning the actions to perform in different contingencies, so it does not make sense to say that we cannot influence what Clippy does because our sensory input tells us that Clippy has “already” left behind a “take 9” rock.](b) If Clippy does

notdo this as a result of actually reasoning about how you behave, then if you “take 5” (say), Clippy willstillhave left a “take 9″ rock behind—by assumption, there’s no logical causality from what you do to what Clippy does.I can’t formalize this argument, and I’m not

surethat (b) especially is the right way of thinking about this problem, but this counter-intuition seems at least as compelling to me as the original intuition. Am I missing something?-- All this said, I no longer think that the idea behind my proposed algorithm is useful. The problem is as far as I can see, if agent 1 runs “do X unless I can prove it’s better to do Y” and agent 2 runs “do A unless I can prove that it’s better to do B,” then the agents won’t play (X,B) or (Y,A) even if this would be the most desirable outcome for both of them—or at least I can’t prove that these outcomes are possible.

“Play minimax unless I can prove the other strategy is better” doesn’t exactly run into this problem in symmetric games as long as the two available strategies have different minimax values, but what I’m really after is a decision theory that would act correctly if it finds such a game embedded in the world, and this does seem to run into the problem described above.

Hello again.

User Perplexed pointed me to Nash’s 1953 paper Two-person cooperative games that seems to give a unique “fair” solution for all two-person

competitivegames in our setting. I’ve been thinking how to spin the algorithm in the paper into a flavor of utility maximization, but failing so far.The key is to understand that when two players form a coalition they create a new entity distinct from its members which has its own revealed preferences and its own utility to maximize. That is, you need to step up a level and imagine each member of a coalition as making a deal with the coalition itself, rather than simply dealing with the other members. In joining a coalition, a player gives up some or all of his decision-making power in exchange for a promise of security.

Define coalescence utility for player A as the amount of extra A utility A gets because the pair plays the cooperation game rather than the threat game. If you wish, think of this as a compensation paid to A by the collective in exchange for his cooperation in the objectives of the collective. Define the utility of cooperation to the ‘collective’ (i.e. to the union of A and B considered as an entity at a different level than A and B themselves) as the sum of the logarithms of the coalescence utilities of A and B. Then, the collective maximizes its own utility by maximizing the product of the coalescence utilities (which means the same as maximizing the sum of the logs).

This approach of taking the logarithm of individual utilities to get components of corporate or collective utility extends nicely to coalitions with more than 2 members. For some analogies in other (biological) kinds of optimization processes where taking the logarithm is the trick that makes it all work, see my favorite paper ever or this classic paper from Dick Lewontin:

R. C. Lewontin and D. Cohen. On population growth in a randomly varying environment. Proceedings of the National Academy of the Sciences, 62:1056–1060, 1969.

True; thanks for pointing that out.

I don’t see what theory do you refer to here, in saying “proof theory is consistent”. We have an algorithm that proves some theorems in PA. The only theory under discussion is PA. If the algorithm finds a proof that defection is better, it defects, period.

If PA is the system the agent uses, then by “the proof system,” I mean PA. I was speaking generically, in case you’d prefer to use ZFC or PA + Con(PA) or whatever.

Er, are you saying that as soon as the algorithm finds such a proof, it stops searching and returns “defect”? That will not work, because then the Löbian reasoning won’t go through, for the reason cousin_it gave. [Because PA cannot prove Con(PA), it also cannot prove that “there’s a short proof that cooperate is better” implies “the agent cooperates”—for all PA knows, PA might be inconsistent, and there might be an even shorter proof that defect is better.]

Or do you mean that the algorithm looks through the whole list of proofs, and if it never finds a proof that cooperate is better, it defects? Then we’re in agreement; I was simply pointing out that in this case, we know that if the algorithm proves “defect is better,” then it will not cooperate, because

weknow that PA is consistent, so if the algorithm proves “defect is better” then it willnotprove “cooperate is better.”It can enumerate all shorter proofs, and see if any of them imply a different action of the agent. (Note that PA is

reflexive, that is can prove that any given finite set of its statements is logically consistent.)Can you give some link or citation about “reflexive”? I can’t figure out what this means. It seems to me that if PA can prove every finite subset of its axioms consistent, then it can prove itself consistent: “If PA is inconsistent, then there would be a statement A and a proof of A and a proof of NOT A; but each proof only uses a finite set of axioms; the union of two finite sets is finite; so there is a finite set of axioms of PA that proves both A and NOT A; but every finite subset of PA is consistent, contradiction. Ergo, PA is consistent.” Clearly, I’m misunderstanding something.

(PA can’t talk about sets in general, but

finitesets are fine, and “finite subset of PA” just means “finite set of propositions, all of which are in the enumeration that is PA.”)It can’t prove that every finite set of its statements is consistent, but for every finite set of its statements it can prove its consistency. (See this chapter, Corollary 8.)

I don’t understand. The relevant step in the argument, as far as I understand it, requires an inference from “there exists a proof in n steps that cooperate is better” to “the agent cooperates”. It seems to me that “enumerating all shorter proofs” would requiring knowing the precise proof (or at least its length), not just the statement “there exists such a proof”. But I’m probably not following what, exactly, you have in mind; could you expand on your argument?

[I should say I was wrong to say “it … cannot prove that” above—all I can say is that one particular technique for showing that it

canprove it doesn’t go through.]The setting is sufficiently confusing at this point that I give up (I’m not sure what we are talking about). I’ll try to work on a proof (not

theproof) of something along the lines of this conjecture.If the agent checked all proofs of length less than X, and none of them prove statement T, there are no proofs of T shorter than X, even if the theory is inconsistent and T is provable (therefore, by longer proofs).

I.e., that the agent won’t find (contradictory) proofs that the same actions will lead to different, even higher utilities. Right, thanks.

(The following sounds too confusing, but making it clearer doesn’t seem straightforward. Posting just in case communication succeeds on some level.)

Payoffs are part of preference, preference is (aspect of) world program, and world program determines the outcome. By changing the payoffs, you change the world program, and where decision-making is based on logical inferences about it, you can thus fundamentally change the shape of possible inferences. “Nice” solutions mean parametrization of the solution by elements of the problem statement, but here parametrization is bound to talk about what can be logically inferred, so expect no nice solutions.

Sounds right (but it’s a rather strong supposition).

Sorry, I wasn’t being as clear as I should have been:

The above is true as a metatheorem, but uninteresting, because as you say, the supposition is rather strong.

What I was trying to argue is that in the agent’s proof system (for concreteness, say PA), when assuming the supposition, the conclusion follows, so by the deduction theorem, PA proves “the supposition implies the conclusion”. I wanted to use this for a proof-size-bounded Löbian argument à la cousin_it.

But this does not work because in PA, the conclusion does

notfollow when assuming the supposition, as pointed out by cousin_it.This post does solve the class of problems Eliezer was talking about (I call it the “fair” class), but my previous Löbian post used a

differentalgorithm that didn’t do utility maximization. That algorithm explicitly said that it cooperates if it can find the proof that agent1()==agent2(), and that explicitness made the proof work in the first place.As far as I can see right now, making the payoffs slightly non-symmetric breaks everything. If you think an individual utility-maximizing algorithm can work in general game-theoretical situations, you have a unique canonical solution for all equilibrium selection and bargaining problems with non-transferable utility. Judging from the literature, finding such a unique canonical solution is

extremelyunlikely.Would I be right in thinking that this implies that you can’t apply could to world()? i.e. “Our universe could have a green sky” wouldn’t be meaningful without some sort of metaverse program that references what we would think is the normal world program?

Or have the utility computed also depend on some set of facts about the universe?

I think both of those would require that the agent have some way of determining the facts about the universe (I suppose they could figure it out from the source code, but that seems sort of illegitimate to me).

This post only gives a way to apply “could” to yourself. This doesn’t imply you can never apply “could” to the world, we might find another way to do that someday.

It seems like there are two kinds of “could” at work here, one that applies to yourself and is based on consistent action to utility relationships, and another that involves uncertainty as to what actions cause what utilities (based on counterfactuals about the universe).

Thanks, good point about uncertainty. I’m making a mental note to see how it relates to counterfactuals.

Following is a generalization to take into account logical uncertainty (inability to prove difficult statements, which can lead in particular to inability to prove

anystatements of the necessary kind).Instead of proving statements of the form “agent()==A implies world()==U”, let’s prove statements of the form “(X and agent()==A) implies world()==U”, where X are unsubstantiated assumptions. There are obviously provable statements of the new form. Now, instead of looking for a statement with highest U, let’s for each A look at the sets of pairs (X,U) for proved statements. We can now use the inverse of length of X to weight the corresponding utilities and estimate the “expected utility” of performing action A by summing over all explored X, even if we are unable to complete a single proof of the statements of the original form. For the same reason as in the post, this “expected utility” calculation is valid for the action that we choose.

(This looks like a potentially dangerous algorithm.)

Edit: As cousin_it points out, it doesn’t quite work as intended, because X could just be “false”, which would enable arbitrarily high U and collapse the “expected utility” calculation.Ouch. X could simply say “false” and the whole thing explodes in your face.

There are logics (relevant or paraconsistent) logics which do not have the classical “ex falso quodlibet”.

True, didn’t notice that. I still feel there must be a workaround.

If you sum over all explored X (and implicitly treat unexplored X as if they are zero probability) then you can be arbitrarily far from optimal Bayesian behavior.

See Fitelson’s “Bayesians sometimes cannot ignore even very implausible theories.”

http://fitelson.org/hiti.pdf

Thanks for that link. Good stuff.

What if instead of assuming your Xs, you got them out of the program?

To simplify, imagine a universe with only one fact. A function fact() returns 1 if that fact is true, and 0 if its not.

Now, agent can prove statements of the form “fact()==1 and agent()==A implies world()==U”. This avoids the problem of plugging in false, agent doesn’t start from assuming X as a statement of its own from which it can derive things, it looks at all the possible outputs of fact(), then sees how that combines with its action to produce utility.

Trivially, if agent has access to the code of fact(), then it can just figure out what that result would be, and it would know what actions correspond to what utilities.

Otherwise, agent() could use a prior distribution or magically infer some probability distribution of if fact is true or not, then choose its action based on normal expected utility calculations.

Note: The above departs from the structure of the world() function given here in that it assumes some way of interacting with world (or its source code) to find out fact() without actually getting the code of fact. Perhaps world is called multiple times, and agent() can observe the utility it accumulates and use that (combined with its “agent()==A and fact()==1 implies world() == U” proofs) to figure out if fact is true or not?

Interestingly, while this allows agent to apply “could” to the world (“It could rain tomorrow”), it doesn’t allow the agent to apply “could” to things that don’t effect world’s utility calculation (“People could have nonmaterial souls that can be removed without influencing their behavior in any way”).

Does this differ from, or contradict something in, how EY reduces “could” in his free will solution? I thought the could-issue was resolved.