Johannes Treutlein(Johannes Treutlein)

Karma: 784

johannestreutlein.com

Johannes Treutlein 10 Jan 2017 13:22 UTC
4 points
in reply to: sen’s comment on: Which areas of rationality are underexplored? - Discussion Thread

Rationality is about more than empirical studies. It’s about developing sensible models of the world. It’s about conveying sensible models to people in ways that they’ll understand them. It’s about convincing people that your model is better than theirs, sometimes without having to do an experiment.

Hmm, I’m not sure I understand what you mean. Maybe I’m missing something? Isn’t this exactly what Bayesianism is about? Bayesianism is just using laws of probability theory to build an understanding of the world, given all the evidence that we encounter. Of course that’s at the core just plain math. E.g., when Albert Einstein thought of relativity, that was an insight without having done any experiment, but it is perfectly in accordance with Bayesianism.

Bayesian probability theory seems to be all we need to find out truths about the universe. In this framework, we can explain stuff like “Occam’s Razor” in a formal way, and we can even include Popperian reasoning as a special case (a hypothesis has to condense probability mass on some of the outcomes in order to be useful. If you then receive evidence that would have been very unlikely given the hypothesis, we shift down the hypothesis’ probability a lot (=falsification). If we receive confirming evidence that could have been explained just as well by other theories, this only slightly upshifts our probability; see EY’s introduction.) But maybe this is not the point that you were trying to make?

I also think that EY is not Bayesian sometimes. He often assigns something 100 per cent probability without any empirical evidence, but because simplicity and beauty of the theory. For example that MWI is correct interpretation of QM. But if you put 0 probability on something (other interpretations), it can’t be updated by any evidence.

Hmm, I’m quite confident (not 100%) that he’s just assigning a very high probability to it, since it seems to be the way more parsimonious and computationally “shorter” explanation, but of course not 100% :) (see Occam’s razor link above for why Bayesians give shorter explanations more a priori credence.)

Regarding Kuhnianism: Maybe it’s a good theory of how the social progress of science works, but how does it help me with having more accurate beliefs about the world? I don’t know much about it, so would be curious about relevant information! :)

Johannes Treutlein 24 Jan 2017 19:18 UTC
3 points
in reply to: lifelonglearner’s comment on: Did EDT get it right all along? Introducing yet another medical Newcomb problem
Yes, that’s correct. I would say that “two-boxing” is generally what CDT would recommend, and “one-boxing” is what EDT recommends. Yes, medical Newcomb problems are different from Newcomb’s original problem in that there are no simulations of decisions involved in the former.

Johannes Treutlein 26 Jan 2017 15:14 UTC
LW: 1 AF: 1
AF
in reply to: Vladimir_Nesov’s comment on: Did EDT get it right all along? Introducing yet another medical Newcomb problem
Thanks for your comment! I find your line of reasoning in the ASP problem and the Coin Flip Creation plausible. So your point is that, in both cases, by choosing a decision algorithm, one also gets to choose where this algorithm is being instantiated? I would say that in the CFC, choosing the right action is sufficient, while in the ASP you also have to choose the whole UDP program so as to be instantiated in a beneficial way (similar to the distinction of how TDT iterates over acts and UDT iterates over policies).

Would you agree that the Coin Flip Creation is similar to e.g. the Smoking Lesion? I could also imagine that by not smoking, UDT would become more likely to be instantiated in a world where the UDT agent doesn’t have the gene (or that the gene would eliminate (some of) the UDT agents from the worlds where they have cancer). Otherwise there couldn’t be a study showing a correlation between UDT agents’ genes and their smoking habits. If the participants of the study used a different decision theory or, unlike us, didn’t have knowledge of the results of the study, UDT would probably smoke. But in this case I would argue that EDT would do so as well, since conditioning on all of this information puts it out of the reference class of the people in the study.

One could probably generalize this kind of “likelihood of being instantiated” reasoning. My guess would be that an UDT version that takes it into account might behave according to conditional probabilities like EDT. Take e.g. the example from this post by Nate Soares. If there isn’t a principled difference to the Coin Flip Case that I’ve overlooked, then UDT might reason that if it takes “green”, it will become very likely that it will be instantiated only in a world where gamma rays hit the UDT agent (since apparently, UDT agents that choose green are “eliminated” from worlds without gamma rays – or at least that’s what I have to assume if I don’t know any additional facts). Therefore our specified version of UDT takes the red box. The main argument I’m trying to make is that if you solve the problem like this, then UDT would (at least here, and possibly in all cases) become equivalent to updateless EDT. Which as far as I know would be a relief, since (u)EDT seems easier to formalize?

Johannes Treutlein 26 Jan 2017 17:07 UTC
1 point
AF
in reply to: Vaniver’s comment on: Did EDT get it right all along? Introducing yet another medical Newcomb problem

The point of decision theories is not that they let you reach from beyond the Matrix and change reality in violation of physics; it’s that you predictably act in ways that optimize for various criteria.

I agree with this. But I would argue that causal counterfactuals somehow assume that we can “reach from beyond the Matrix and change reality in violation of physics”. They work by comparing what would happen if we detached our “action node” from its ancestor nodes and manipulated it in different ways. So causal thinking in some way seems to violate the deterministic way the world works. Needless to say, all decision theories somehow have to reason through counterfactuals, so they all have to form “impossible” hypotheses. My point is that if we assume that we can have a causal influence on the future, then this is already a kind of violation of determinism, and I would reason that assuming that we can also have a retro-causal one on the past doesn’t necessarily make things worse. In some sense, it might even be more in line with how the world works: the future is as fixed as the past, and the EDT approach is to merely “find out” which respective past and future are true.

But this is a decision problem where your action has been divorced from your intended action, and so attributing the victory of heads children to EDT is mistaken, because of the tails child with EDT who wanted to two-box but couldn’t.

Hmm, I’m not sure. It seems as thought in your setup, the gurus have to change the children’s decision algorithms, in which case of course the correlation would vanish. Or the children use a meta decision theory like “think about the topic and consider what the guru tells you and then try to somehow do whatever winning means”. But if Omega created you with the intention of making you one-box or two-box, it could easily just have added some rule or change the meta theory so that you would end up just not being convinced of the “wrong” theory. You would have magically ended up doing (and thinking) the right thing, without “wanting” but not “being able to”. I mean, I am trying to convince you of some decision theory right now, and you already have some knowledge and meta decision theory that ultimately will lead you to either adopt or reject it. Maybe the fact that you’re not yet convinced shows that you’re living in the tails world? ;) Maybe Omega’s trick is to make the tails people think about guru cases in order to get them to reject EDT?

One could maybe even object to Newcomb’s original problem on similar grounds. Imagine the prediction has already been made 10 years ago. You learned about decision theories and went to one of the gurus in the meantime, and are now confronted with the problem. Are you now free to choose or does the prediction mess with your new, intended action, so that you can’t choose the way you want? I don’t believe so – you’ll feel just as free to choose as if the prediction had happened 10 minutes ago. Only after deciding freely, you find out that you have been determined to decide this way from the beginning, because Omega of course also accounted for the guru.

In general, I tend to think that adding some “outside influence” to a Newcomb’s problem either makes it a different decision problem, or it’s irrelevant and just confuses things.

Johannes Treutlein 26 Jan 2017 17:30 UTC
LW: 1 AF: 1
AF
in reply to: cousin_it’s comment on: Did EDT get it right all along? Introducing yet another medical Newcomb problem
I agree with points 1) and 2). Regarding point 3), that’s interesting! Do you think one could also prove that if you don’t smoke, you can’t (or are less likely to) have the gene in the Smoking Lesion? (See also my response to Vladimir Nesov’s comment.)

Johannes Treutlein 30 Jan 2017 13:47 UTC
LW: 1 AF: 1
AF
in reply to: Vaniver’s comment on: Did EDT get it right all along? Introducing yet another medical Newcomb problem

I suspect this is a confusion about free will. To be concrete, I think that a thermostat has a causal influence on the future, and does not violate determinism. It deterministically observes a sensor, and either turns on a heater or a cooler based on that sensor, in a way that does not flow backwards—turning on the heater manually will not affect the thermostat’s attempted actions except indirectly through the eventual effect on the sensor.

Fair point :) What I meant was that for every world history, there is only one causal influence I could possibly have on the future. But CDT reasons through counterfactuals that are physically impossible (e.g. two-boxing in a world where there is money in box A), because it combines world states with actions it wouldn’t take in those worlds. EDT just assumes that it’s choosing between different histories, which is kind of “magical”, but at least all those histories are internally consistent. Interestingly, e.g. Proof-Based DT would probably amount to the same kind of reasoning? Anyway, it’s probably a weak point if at all, and I fully agree that the issue is orthogonal to the DT question!

I basically agree with everything else you write, and I don’t think it contradicts my main points.

Johannes Treutlein 30 Jan 2017 13:50 UTC
LW: 2 AF: 1
AF
in reply to: cousin_it’s comment on: Did EDT get it right all along? Introducing yet another medical Newcomb problem
That’s what I was trying to do with the Coin Flip Creation :) My guess: once you specify the Smoking Lesion and make it unambiguous, it ceases to be an argument against EDT.

Johannes Treutlein 3 Feb 2017 10:53 UTC
0 points
on: Is Evidential Decision Theory presumptuous?

CDT, TDT, and UDT would not give away the money because there is no causal (or acausal) influence on the number of universes.

I’m not so sure about UDT’s response. From what I’ve heard, depending on the exact formal implementation of the problem, UDT might also pay the money? If your thought experiment works via a correlation between the type of universe you live in and the decision theory you employ, then it might be a similar problem to the Coin Flip Creation. I introduced the latter decision problem in an attempt to make a less ambiguous version of the Smoking Lesion. In a comment in response to my post, cousin_it writes:

Here’s why I think egoistic UDT would one-box. From the problem setup it’s provable that one-boxing implies finding money in box A. That’s exactly the information that UDT requires for decision making (“logical counterfactual”). It doesn’t need to deduce unconditionally that there’s money in box A or that it will one-box.

One possible confounder in your thought experiment is the agent’s altruism. The agent doesn’t care about which world he lives in, but only about which worlds exist. If you reason from an “updateless”, outside perspective (like Anthropic Decision Theory), it then becomes irrelevant what you choose. This is because if you act in a way that’s only logically compatible with world A, you know you just wouldn’t have existed in the other world. A way around this would be if you’re not completely updateless, but if you instead have already updated on the fact that you do exist. In this case you’d have more power with your decision. “One-boxing” might also make sense if you’re just a copy-egoist and prefer to live in world A.

Johannes Treutlein 22 Feb 2017 13:50 UTC
LW: 3 AF: 1
AF
in reply to: Vladimir_Nesov’s comment on: “Betting on the Past” – a decision problem by Arif Ahmed
Thanks a lot for your elaborate reply!

(So I’m not even sure what CDT is supposed to do here, since it’s not clear that the bet is really on the past state of the world and not on truth of a proposition about the future state of the world.)

Hmm, good point. The truth of the proposition is evaluated on basis of Alice’s action, which she can causally influence. But we could think of a Newcomblike scenario in which someone made a perfect prediction a 100 years ago and put down a note about what state the world was in at that time. Now instead of checking Alice’s action, we just check this note to evaluate whether the proposition is true. I think then it’s clear that CDT would “two-box”.

Given that, I don’t see what role “LDT’s algorithm already existed yesterday” plays here, and I think it’s misleading to state that “it can change yesterday’s world and make the proposition true”. Instead it can make the proposition true without changing yesterday’s world, by ensuring that yesterday’s world was always such that the proposition is true. There is no change, yesterday’s world was never different and the proposition was never false.

Sorry for the fuzzy wording! I agree that “change” is not a good terminology. I was thinking about TDT and a causal graph. In that context, it might have made sense to say that TDT can “determine the output” of the decision nodes, but not that of the nature nodes that have a causal influence on the decision nodes?

Following from the preceding point, it doesn’t matter when the past state of the world is, since we are not trying to influence it, we are instead trying to influence its consequences, which are in the future.

OK, if I interpret that correctly, you would say that our proposition is also a program that references Alice’s decision algorithm, and hence we can just determine that program’s output the same way we can determine our own decision. I am totally fine with that. If we can expand this principle to all the programs that somehow reference our decision algorithms, I would be curious whether there are still differences left between this and evidential counterfactuals.

Take the thought experiment in this post, for instance: Imagine you’re an agent that always chooses the action “take the red box”. Now there is a program that checks whether there will be cosmic rays, and if so, then it changes your decision algorithm to one that outputs “take the green box”. Of course, you can still “influence” your output like all regular humans, and you can thus in some sense also influence the output of the program that changed you. By extension, you can even influence whether or not the output of the program “outer space” is “gamma rays” or “no gamma rays”. If I understand your answers to my Coin Flip Creation post correctly, this formulation would make the problem into a kind of anthropic problem again, where the algorithm would at one point “choose to output red” in order to be instantiated into the world without gamma rays. Would you agree with this, or did I get something wrong?

Johannes Treutlein 24 Feb 2017 9:44 UTC
LW: 3 AF: 1
AF
in reply to: cousin_it’s comment on: “Betting on the Past” – a decision problem by Arif Ahmed
Thanks for the link! What I don’t understand is how this works in the context of empirical and logical uncertainty. Also, it’s unclear to me how this approach relates to Bayesian conditioning. E.g. if the sentence “if a holds, than o holds” is true, doesn’t this also mean that P(o|a)=1? In that sense, proof-based UDT would just be an elaborate specification of how to assign these conditional probabilities “from the viewpoint of the original position”, so with updatelessness, and in the context of full logical inference and knowledge of the world, including knowledge about one’s own decision algorithm. I see how this is useful, but don’t understand how it would at any point contradict normal Bayesian conditioning.

As to your first question: if we ignore problems that involve updatelessness (or if we just stipulate that EDT always had the opportunity to precommit), I haven’t been able to find any formally specified problems where EDT and UDT diverge.

I think Caspar Oesterheld’s and my flavor of EDT would be ordinary EDT with some version of updatelessness. I’m not sure if this works, but if it turns out to be identical to UDT, then I’m not sure which of the two is the better specified or easier to formalize one. According to the language in Arbital’s LDT article, my EDT would differ from UDT only insofar as instead of some logical conditioning, we use ordinary Bayesian conditioning. So (staying in the Arbital framework), it could look something like this (P stands for whatever prior probability distribution you care about):

$(a r g m a x π_{x} \in Π \sum_{o_{i} \in O} U (o_{i}) \cdot P (o_{i} | π_{x})) (s)$

Johannes Treutlein 24 Feb 2017 10:08 UTC
2 points
in reply to: ProofOfLogic’s comment on: Is Evidential Decision Theory presumptuous?
I agree with all of this, and I can’t understand why the Smoking Lesion is still seen as the standard counterexample to EDT.

Regarding the blackmail letter: I think that in principle, it should be possible to use a version of EDT that also chooses policies based on a prior instead of actions based on your current probability distribution. That would be “updateless EDT”, and I think it wouldn’t give in to Evidential Blackmail. So I think rather than an argument against EDT, it’s an argument in favor of updatelessness.

Johannes Treutlein 25 Feb 2017 21:13 UTC
1 point
in reply to: Vladimir_Nesov’s comment on: Is Evidential Decision Theory presumptuous?
Thanks for the reply and all the useful links!

It’s not a given that you can easily observe your existence.

It took me a while to understand this. Would you say that for example in the Evidential Blackmail, you can never tell whether your decision algorithm is just being simulated or whether you’re actually in the world where you received the letter, because both times, the decision algorithms receive exactly the same evidence? So in this sense, after updating on receiving the letter, both worlds are still equally likely, and only via your decision do you find out which of those worlds are the simulated ones and which are the real ones. One can probably generalize this principle: you can never differentiate between different instantiations of your decision algorithm that have the same evidence. So when you decide what action to output conditional on receiving some sense data, you always have to decide based on your prior probabilities. Normally, this works exactly as if you would first update on this sense data and then decide. But sometimes, e.g. if your actions in one world make a difference to the other world via a simulation, then it makes a difference. Maybe if you assign anthropic probabilities to either being a “logical zombie” or the real you, then the result would be like UDT even with updating?

What I still don’t understand is how this motivates updatelessness with regard to anthropic probabilities (e.g. if I know that I have a low index number, or in Psy Kosh’s problem, if I already know I’m the decider). I totally get how it makes sense to precommit yourself and how one should talk about decision problems instead of probabilities, how you should reason as if you’re all instantiations of your decision algorithm at once, etc. Also, intuitively I agree with sticking with the priors. But somehow I can’t get my head around what exactly is wrong about the update. Why is it wrong to assign more “caring energy” to the world in which some kind of observation that I make would have been more probable? Is it somehow wrong that it “would have been more probable”? Did I choose the wrong reference classes? Is it because in these problems, too, the worlds influence each other, so that you have to consider the impact that your decision would have on the other world as well?

Edit: Never mind, I think http://lesswrong.com/lw/jpr/sudt_a_toy_decision_theory_for_updateless/ kind of answers my question :)

Johannes Treutlein 6 Jun 2017 7:31 UTC
2 points
on: The sin of updating when you can change whether you exist

Imagine that Omega tells you that it threw its coin a million years ago, and would have turned the sky green if it had landed the other way. Back in 2010, I wrote a post arguing that in this sort of situation, since you’ve always seen the sky being blue, and every other human being has also always seen the sky being blue, everyone has always had enough information to conclude that there’s no benefit from paying up in this particular counterfactual mugging, and so there hasn’t ever been any incentive to self-modify into an agent that would pay up … and so you shouldn’t.

I think this sort of reasoning doesn’t work if you also have a precommitment regarding logical facts. Then you know the sky is blue, but you don’t know what that implies. When Omega informs you about the logical connection between sky color, your actions, and your payoff, then you won’t update on this logical fact. This information is one implication away from the logical prior you precommitted yourself to. And the best policy given this prior, which contains information about sky color, but not about this blackmail, is not to pay: not paying will a priori just change the situation in which you will be blackmailed (hence, what blue sky color means), but not the probability of a positive intelligence explosion in the first place. Knowing or not knowing the color of the sky doesn’t make a difference, as long as we don’t know what it implies.

(HT Lauro Langosco for pointing this out to me.)

Johannes Treutlein 10 Jul 2017 17:48 UTC
LW: 6 AF: 5
AF
on: Smoking Lesion Steelman
From my perspective, I don’t think it’s been adequately established that we should prefer updateless CDT to updateless EDT

I agree with this.

It would be nice to have an example which doesn’t arise from an obviously bad agent design, but I don’t have one.

I’d also be interested in finding such a problem.

I am not sure whether your smoking lesion steelman actually makes a decisive case against evidential decision theory. If an agent knows about their utility function on some level, but not on the epistemic level, then this can just as well be made into a counter-example to causal decision theory. For example, consider a decision problem with the following payoff matrix:

Smoke-lover:
- Smokes:
  - Killed: 10
  - Not killed: −90
- Doesn’t smoke:
  - Killed: 0
  - Not killed: 0
Non-smoke-lover:
- Smokes:
  - Killed: −100
  - Not killed: −100
- Doesn’t smoke:
  - Killed: 0
  - Not killed: 0
For some reason, the agent doesn’t care whether they live or die. Also, let’s say that smoking makes a smoke-lover happy, but afterwards, they get terribly sick and lose 100 utilons. So they would only smoke if they knew they were going to be killed afterwards. The non-smoke-lover doesn’t want to smoke in any case.

Now, smoke-loving evidential decision theorists rightly choose smoking: they know that robots with a non-smoke-loving utility function would never have any reason to smoke, no matter which probabilities they assign. So if they end up smoking, then this means they are certainly smoke-lovers. It follows that they will be killed, and conditional on that state, smoking gives 10 more utility than not smoking.

Causal decision theory, on the other hand, seems to recommend a suboptimal action. Let $a_{1}$ be smoking, $a_{2}$ not smoking, $S_{1}$ being a smoke-lover, and $S_{2}$ being a non-smoke-lover. Moreover, say the prior probability $P (S_{1})$ is $0.5$ . Then, for a smoke-loving CDT bot, the expected utility of smoking is just

$E [U | a_{1}] = P (S_{1}) \cdot U (S_{1} \land a_{1}) + P (S_{2}) \cdot U (S_{2} \land a_{1}) = 0.5 \cdot 10 + 0.5 \cdot (- 90) = - 40$ ,

which is less then the certain $0$ utilons for $a_{2}$ . Assigning a credence of around $1$ to $P (S_{1} | a_{1})$ , a smoke-loving EDT bot calculates

$E [U | a_{1}] = P (S_{1} | a_{1}) \cdot U (S_{1} \land a_{1}) + P (S_{2} | a_{1}) \cdot U (S_{2} \land a_{1}) \approx 1 \cdot 10 + 0 \cdot (- 90) = 10$ ,

which is higher than the expected utility of $a_{2}$ .

The reason CDT fails here doesn’t seem to lie in a mistaken causal structure. Also, I’m not sure whether the problem for EDT in the smoking lesion steelman is really that it can’t condition on all its inputs. If EDT can’t condition on something, then EDT doesn’t account for this information, but this doesn’t seem to be a problem per se.

In my opinion, the problem lies in an inconsistency in the expected utility equations. Smoke-loving EDT bots calculate the probability of being a non-smoke-lover, but then the utility they get is actually the one from being a smoke-lover. For this reason, they can get some “back-handed” information about their own utility function from their actions. The agents basically fail to condition two factors of the same product on the same knowledge.

Say we don’t know our own utility function on an epistemic level. Ordinarily, we would calculate the expected utility of an action, both as smoke-lovers and as non-smoke-lovers, as follows:

$E [U | a] = P (S_{1} | a) \cdot E [U | S_{1}, a] + P (S_{2} | a) \cdot E [U | S_{2}, a]$ ,

where, if $U_{1}$ ( $U_{2}$ ) is the utility function of a smoke-lover (non-smoke-lover), $E [U | S_{i}, a]$ is equal to $E [U_{i} | a]$ . In this case, we don’t get any information about our utility function from our own action, and hence, no Newcomb-like problem arises.

I’m unsure whether there is any causal decision theory derivative that gets my case (or all other possible cases in this setting) right. It seems like as long as the agent isn’t certain to be a smoke-lover from the start, there are still payoffs for which CDT would (wrongly) choose not to smoke.
What links here?
- Smoking Lesion Steelman II by abramdemski (2 Oct 2017 22:11 UTC; 2 points)

Johannes Treutlein 21 Aug 2017 12:55 UTC
LW: 3 AF: 3
AF
in reply to: abramdemski’s comment on: Smoking Lesion Steelman
Thanks for your answer! This “gain” approach seems quite similar to what Wedgwood (2013) has proposed as “Benchmark Theory”, which behaves like CDT in cases with, but more like EDT in cases without causally dominant actions. My hunch would be that one might be able to construct a series of thought-experiments in which such a theory violates transitivity of preference, as demonstrated by Ahmed (2012).

I don’t understand how you arrive at a gain of 0 for not smoking as a smoke-lover in my example. I would think the gain for not smoking is higher:

$Gain (a_{2}) = E [U | a_{2}] - E [U | a_{2}, do (a_{1})] = P (S_{1} | a_{2}) \cdot U (S_{1} \land a_{2}) + P (S_{2} | a_{2}) \cdot U (S_{2} \land a_{2}) - P (S_{1} | a_{2}) \cdot U (S_{1} \land a_{1}) - P (S_{2} | a_{2}) \cdot U (S_{2} \land a_{1})$

$= P (S_{1} | a_{2}) \cdot - 10 + P (S_{2} | a_{2}) \cdot 90 = P (S_{1} | a_{2}) \cdot - 100 + 90$ .

So as long as $P (S_{1} | a_{2}) < 0.8$ , the gain of not smoking is actually higher than that of smoking. For example, given prior probabilities of 0.5 for either state, the equilibrium probability of being a smoke-lover given not smoking will be 0.5 at most (in the case in which none of the smoke-lovers smoke).

Johannes Treutlein 26 Sep 2017 10:38 UTC
4 points
in reply to: Stuart_Armstrong’s comment on: Naturalized induction – a challenge for evidential and causal decision theory
EDT doesn’t pay if it is given the choice to commit to not paying ex-ante (before receiving the letter). So the thought experiment might be an argument against ordinary EDT, but not against updateless EDT. If one takes the possibility of anthropic uncertainty into account, then even ordinary EDT might not pay the blackmailer. See also Abram Demski’s post about the Smoking Lesion. Ahmed and Price defend EDT along similar lines in a response to a related thought experiment by Frank Arntzenius.

Johannes Treutlein 31 Mar 2018 15:37 UTC
1 point
on: Announcement: AI alignment prize winners and next round
I would like to submit the following entries:

A typology of Newcomblike problems (philosophy paper, co-authored with Caspar Oesterheld).

A wager against Solomonoff induction (blog post).

Three wagers for multiverse-wide superrationality (blog post).

UDT is “updateless” about its utility function (blog post). (I think this post is hard to understand. Nevertheless, if anyone finds it intelligible, I would be interested in their thoughts.)

Johannes Treutlein 28 Aug 2018 16:54 UTC
LW: 3 AF: 2
AF
on: Two Notions of Best Response
Wolfgang Spohn develops the concept of a “dependency equilibrium” based on a similar notion of evidential best response (Spohn 2007, 2010). A joint probability distribution $P$ is a dependency equilibrium if all actions of all players that have positive probability are evidential best responses. In case there are actions with zero probability, one evaluates a sequence $(P^{(i)})_{i \in N}$ of joint probability distributions such that ${lim}_{i \to \infty} P^{(i)} = P$ and $P^{(i)} (a) \neq 0$ for all actions $a$ and $i \in N$ . Using your notation of a probability matrix and a utility matrix, the expected utility of an action $a_{j}$ is then defined as the limit of the conditional expected utilities, $lim i \to \infty \frac{U_{j} P_{j}^{(i)}}{| P_{j}^{(i)} |}$ (which is defined for all actions). Say $P$ is a probability matrix with only one zero column, $P_{j}$ . It seems that you can choose an arbitrary nonzero vector $Q_{j}$ , $| Q_{j} | = 1$ to construct, e.g., a sequence of probability matrices $(\frac{i - 1}{i} P + [0, \dots, 0, \frac{1}{i} Q_{j}, 0, \dots, 0])_{i \in N} .$ The expected utilities in the limit for all other actions and the actions of the opponent shouldn’t be influenced by this change. So you could choose $Q_{j}$ as the standard vector $e_{i}$ where $i$ is an index such that $U_{j, i} = min U_{j}$ . The expected utility of $a_{j}$ would then be $min U_{j}$ . Hence, this definition of best response in case there are actions with zero probability probably coincides with yours (at least for actions with positive probability—Spohn is not concerned with the question of whether a zero probability action is a best response or not).

The whole thing becomes more complicated with several zero rows and columns, but I would think it should be possible to construct sequences of distributions which work in that case as well.

Johannes Treutlein 8 Jun 2022 21:53 UTC
LW: 2 AF: 1
AF
in reply to: John Schulman’s comment on: Intuitions about solving hard problems
I’d also be curious about this!

Johannes Treutlein 22 Jun 2022 23:41 UTC
LW: 2 AF: 1
AF
in reply to: Johannes Treutlein’s comment on: Intuitions about solving hard problems
I find this particularly curious since naively, one would assume that weight sharing implicitly implements a simplicity prior, so it should make optimization more likely and thus also deceptive behavior? Maybe the argument is that somehow weight sharing leaves less wiggle room for obscuring one’s reasoning process, making a potential optimizer more interpretable? But the hidden states and tied weights could still be encoding deceptive reasoning in an uninterpretable way?