To me it feels like the natural place to draw the line is update-on-computations but updateless-on-observations.
A first problem with this is that there is no sharp distinction between purely computational (analytic) information/observations and purely empirical (synthetic) information/observations. This is a deep philosophical point, well-known in the analytic philosophy literature, and best represented by Quine’s Two dogmas of empiricism, and his idea of the “Web of Belief”. (This is also related to Radical Probabilisim.) But it’s unclear if this philosophical problem translates to a pragmatic one. So let’s just assume that the laws of physics are such that all superintelligences we care about converge on the same classification of computational vs empirical information.
A second and more worrying problem is that, even given such convergence, it’s not clear all other agents will decide to forego the possible apparent benefits of logical exploitation. It’s a kind of Nash equilibrium selection problem: If I was very sure all other agents forego them (and have robust cooperation mechanisms that deter exploitation), then I would just do like them. And indeed, it’s conceivable that our laws of physics (and algorithmics) are such that this is the case, and all superintelligences converge on the Schelling point of “never exploiting the learning of logical information”. But my probability of that is not very high, especially due to worries that different superintelligences might start with pretty different priors, and make commitments early on (before all posteriors have had time to converge). (That said, my probability is high that almost all deliberation is mostly safe, by more contingent reasons related to the heuristics they use and values they have.) You might also want to say something like “they should just use the correct decision theory to converge on the nicest Nash equilibrium!”. But that’s question-begging, because the worry is exactly that others might have different notions of this normative “nice” (indeed, no objective criterion for decision theory). The problem recurs: we can’t just invoke a decision theory to decide on the correct decision theory.
Am I missing something about why logical counterfactual muggings are likely to be common?
As mentioned in the post, Counterfactual Mugging as presented won’t be common, but equivalent situations in multi-agentic bargaining might, due to (the naive application of) some priors leading to commitment races. (And here “naive” doesn’t mean “shooting yourself in the foot”, but rather “doing what looks best from the prior”, even if unbeknownst to you it has dangerous consequences.)
if it comes up it seems that an agent that updates on computations can use some precommitment mechanism to take advantage of it
It’s not looking like something as simple as that will solve, because of reasoning as in this paragraph:
Unfortunately, it’s not that easy, and the problem recurs at a higher level: your procedure to decide which information to use will depend on all the information, and so you will already lose strategicness. Or, if it doesn’t depend, then you are just being updateless, not using the information in any way.
Or in other words, you need to decide on the precommitment ex ante, when you still haven’t thought much about anything, so your precommitment might be bad. (Although to be fair there are ongoing discussions about this.)
A first problem with this is that there is no sharp distinction between purely computational (analytic) information/observations and purely empirical (synthetic) information/observations.
I don’t see the fuzziness here, even after reading the two dogmas wikipedia page (but not really understanding it, it’s hidden behind a wall of jargon). If we have some prior over universes, and some observation channel, we can define an agent that is updateless with respect to that prior, and updateful with respect to any calculations it performs internally. Is there a section of Radical Probablism that is particularly relevant? It’s been a while. It’s not clear to me why all superintelligences having the same classification matters. They can communicate about edge cases and differences in their reasoning. Do you have an example here?
A second and more worrying problem is that, even given such convergence, it’s not clear all other agents will decide to forego the possible apparent benefits of logical exploitation. It’s a kind of Nash equilibrium selection problem: If I was very sure all other agents forego them (and have robust cooperation mechanisms that deter exploitation), then I would just do like them.
I think I don’t understand why this is a problem. So what if there are some agents running around being updateless about logic? What’s the situation that we are talking about a Nash equilibrium for?
As mentioned in the post, Counterfactual Mugging as presented won’t be common, but equivalent situations in multi-agentic bargaining might, due to (the naive application of) some priors leading to commitment races.
Can you point me to an example in bargaining that motivates the usefulness of logical updatelessness? My impression of that section wasn’t “here is a realistic scenario that motivates the need for some amount of logical updatelessness”, it felt more like “logical bargaining is a situation where logical updatelessness plausibly leads to terrible and unwanted decisions”.
It’s not looking like something as simple as that will solve, because of reasoning as in this paragraph:
Unfortunately, it’s not that easy, and the problem recurs at a higher level: your procedure to decide which information to use will depend on all the information, and so you will already lose strategicness. Or, if it doesn’t depend, then you are just being updateless, not using the information in any way.
Or in other words, you need to decide on the precommitment ex ante, when you still haven’t thought much about anything, so your precommitment might be bad.
Yeah I wasn’t thinking that was a “solution”, I’m biting the bullet of losing some potential value and having a decision theory that doesn’t satisfy all the desiderata. I was just saying that in some situations, such an agent can patch the problem using other mechanisms, just as an EDT agent can try to implement some external commitment mechanism if it lives in a world full of transparent newcomb problems.
A first problem with this is that there is no sharp distinction between purely computational (analytic) information/observations and purely empirical (synthetic) information/observations. This is a deep philosophical point, well-known in the analytic philosophy literature, and best represented by Quine’s Two dogmas of empiricism, and his idea of the “Web of Belief”. (This is also related to Radical Probabilisim.)
But it’s unclear if this philosophical problem translates to a pragmatic one. So let’s just assume that the laws of physics are such that all superintelligences we care about converge on the same classification of computational vs empirical information.
A second and more worrying problem is that, even given such convergence, it’s not clear all other agents will decide to forego the possible apparent benefits of logical exploitation. It’s a kind of Nash equilibrium selection problem: If I was very sure all other agents forego them (and have robust cooperation mechanisms that deter exploitation), then I would just do like them. And indeed, it’s conceivable that our laws of physics (and algorithmics) are such that this is the case, and all superintelligences converge on the Schelling point of “never exploiting the learning of logical information”. But my probability of that is not very high, especially due to worries that different superintelligences might start with pretty different priors, and make commitments early on (before all posteriors have had time to converge). (That said, my probability is high that almost all deliberation is mostly safe, by more contingent reasons related to the heuristics they use and values they have.)
You might also want to say something like “they should just use the correct decision theory to converge on the nicest Nash equilibrium!”. But that’s question-begging, because the worry is exactly that others might have different notions of this normative “nice” (indeed, no objective criterion for decision theory). The problem recurs: we can’t just invoke a decision theory to decide on the correct decision theory.
As mentioned in the post, Counterfactual Mugging as presented won’t be common, but equivalent situations in multi-agentic bargaining might, due to (the naive application of) some priors leading to commitment races. (And here “naive” doesn’t mean “shooting yourself in the foot”, but rather “doing what looks best from the prior”, even if unbeknownst to you it has dangerous consequences.)
It’s not looking like something as simple as that will solve, because of reasoning as in this paragraph:
Or in other words, you need to decide on the precommitment ex ante, when you still haven’t thought much about anything, so your precommitment might be bad.
(Although to be fair there are ongoing discussions about this.)
I don’t see the fuzziness here, even after reading the two dogmas wikipedia page (but not really understanding it, it’s hidden behind a wall of jargon). If we have some prior over universes, and some observation channel, we can define an agent that is updateless with respect to that prior, and updateful with respect to any calculations it performs internally. Is there a section of Radical Probablism that is particularly relevant? It’s been a while.
It’s not clear to me why all superintelligences having the same classification matters. They can communicate about edge cases and differences in their reasoning. Do you have an example here?
I think I don’t understand why this is a problem. So what if there are some agents running around being updateless about logic? What’s the situation that we are talking about a Nash equilibrium for?
Can you point me to an example in bargaining that motivates the usefulness of logical updatelessness? My impression of that section wasn’t “here is a realistic scenario that motivates the need for some amount of logical updatelessness”, it felt more like “logical bargaining is a situation where logical updatelessness plausibly leads to terrible and unwanted decisions”.
Yeah I wasn’t thinking that was a “solution”, I’m biting the bullet of losing some potential value and having a decision theory that doesn’t satisfy all the desiderata. I was just saying that in some situations, such an agent can patch the problem using other mechanisms, just as an EDT agent can try to implement some external commitment mechanism if it lives in a world full of transparent newcomb problems.
(Sorry, short on time now, but we can discuss in-person and maybe I’ll come back here to write the take-away)