This doesn’t quite seem right, because just multiplying probabilities only works when all the quantities are independent. However, I’d put higher odds on someone having the ability to recognize a worthwhile result conditional on them having an ability to work on a problem, then having the ability to recognize a worthwhile result, so the multiplication of probabilities will be higher than it seems at first.
I’m unsure whether this consideration affects whether the distribution would be lognormal or not.
(lightly edited restatement of email comment)
Let’s see what happens when we adapt this to the canonical instance of “no, really, counterfactuals aren’t conditionals and should have different probabilities”. The cosmic ray problem, where the agent has the choice between two paths, it slightly prefers taking the left path, but its conditional on taking the right path is a tiny slice of probability mass that’s mostly composed of stuff like “I took the suboptimal action because I got hit by a cosmic ray”.
There will be 0 utility for taking left path, −10 utility for taking the right path, and −1000 utility for a cosmic ray hit. The CDT counterfactual says 0 utility for taking left path, −10 utility for taking the right path, while the conditional says 0 utility for left path, −1010 utility for right path (because conditional on taking the right path, you were hit by a cosmic ray).
In order to get the dutch book to go through, we need to get the agent to take the right path, to exploit P(cosmic ray) changing between the decision time and afterwards. So the initial bet could be something like −1 utility now, +12 utility upon taking the right path and not being hit by a cosmic ray. But now since the optimal action is “take the right path along with the bet”, the problem setup has been changed, and we can’t conclude that the agent’s conditional on taking the right path places high probability on getting hit by a cosmic ray (because now the right path is the optimal action), so we can’t money-pump with the “+0.5 utility, −12 utility upon taking a cosmic ray hit” bet.
So this seems to dutch-book Death-in-Damascus, not CDT≠EDT cases in general.
Yes, UDT means updateless decision theory, “the policy” is used as a placeholder for “whatever policy the agent ends up picking”, much like a variable in an equation, and “the algorithm I wrote” is still unpublished because there were too many things wrong with it for me to be comfortable putting it up, as I can’t even show it has any nice properties in particular. Although now that you mention it, I probably should put it up so future posts about what’s wrong with it have a well-specified target to shoot holes in. >_>
It actually is a weakening. Because all changes can be interpreted as making some player worse off if we just use standard Pareto optimality, the second condition mean that more changes count as improvements, as you correctly state. The third condition cuts down on which changes count as improvements, but the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn’t be improvements under the old concept of Pareto Optimality.
The definition of an almost stratified Pareto optimum was adapted from this , and was developed specifically to address the infinite game in that post involving a non-well-founded chain of players, where nothing is a stratified Pareto optimum for all players. Something isn’t stratified Pareto optimal in a vacuum, it’s stratified Pareto optimal for a particular player. There’s no oracle that’s stratified Pareto optimal for all players, but if you take the closure of everyone’s SPO sets first to produce a set of ASPO oracles for every player, and take the intersection of all those sets, there are points which are ASPO for everyone.
My initial inclination is to introduce Xn as the space of events on turn n, and define Xa:b:=b∏i=aXi and then you can express it as ∑σ∈Xk+2:k+nPn(xk+1,σ|x0...xk) .
The notation for the sum operator is unclear. I’d advise writing the sum as i=k+2,...,k+n and using an i subscript inside the sum so it’s clearer what is being substituted where.
Wasn’t there a fairness/continuity condition in the original ADT paper that if there were two “agents” that converged to always taking the same action, then the embedder would assign them the same value? (more specifically, if Et(|At−Bt|)<δ, then Et(|Et(At)−Et(Bt)|)<ϵ ) This would mean that it’d be impossible to have Et(Et(ADTt,ϵ)) be low while Et(Et(straightt)) is high, so the argument still goes through.
Although, after this whole line of discussion, I’m realizing that there are enough substantial differences between the original formulation of ADT and the thing I wrote up that I should probably clean up this post a bit and clarify more about what’s different in the two formulations. Thanks for that.
in the ADT paper, the asymptotic dominance argument is about the limit of the agent’s action as epsilon goes to 0. This limit is not necessarily computable, so the embedder can’t contain the agent, since it doesn’t know epsilon. So the evil problem doesn’t work.
Agreed that the evil problem doesn’t work for the original ADT paper. In the original ADT paper, the agents are allowed to output distributions over moves. I didn’t like this because it implicitly assumes that it’s possible for the agent to perfectly randomize, and I think randomization is better modeled by a (deterministic) action that consults an environmental random-number generator, which may be correlated with other things.
What I meant was that, in the version of argmax that I set up, if A is the two constant policies “take blank box” and “take shiny box”, then for the embedder F where the opponent runs argmax to select which box to fill, the argmax agent will converge to deterministically randomizing between the two policies, by the logical inductor assigning very similar expected utility to both options such that the inductor can’t predict which action will be chosen. And this occurs because the inductor outputting more of “take the blank box” will have F(shiny) converge to a higher expected value (so argmax will learn to copy that), and the inductor outputting more of “take the shiny box” will have F(blank) converge to a higher expected value (so argmax will learn to copy that).
The optimality proof might be valid. I didn’t understand which specific step you thought was wrong.
So, the original statement in the paper was
It must then be the case that limt→∞Et[Ft(At)−Ft(Bt)]>η for every A∈[A], B∉[A]. Let A be the first element of [A] in A. Since every class will be seperated by at least η in the limit, sadtη(F,A) will eventually be a distribution over just [A]. And since A∼A′ for every A, A′∈[A], by the definition of soft_argmax it must be the case that limt→∞[|sadtη(F,A)t−At|]=0.
The issue with this is the last sentence. It’s basically saying “since the two actions A and A′ get equal expected utility in the limit, the total variation distance between a distribution over the two actions, and one of the actions, limits to zero”, which is false
And it is specifically disproved by the second counterexample, where there are two actions that both result in 1 utility, so they’re both in the same equivalence class, but a probabilistic mixture between them (as sadtη converges to playing, for all η) gets less than 1 utility.
Consider the following embedder. According to this embedder, you will play chicken against ADT-epsilon who knows who you are. When ADT-epsilon considers this embedder, it will always pass the reality filter, since in fact ADT-epsilon is playing against ADT-epsilon. Furthermore, this embedder gives NeverSwerveBot a high utility. So ADT-epsilon expects a high utility from this embedder, through NeverSwerveBot, and it never swerves.
You’ll have to be more specific about “who knows what you are”. If it unpacks as “opponent only uses the embedder where it is up against [whatever policy you plugged in]“, then NeverSwerveBot will have a high utility, but it will get knocked down by the reality filter, because if you converge to never swerving, Et(Ut) will converge to 0, and the inductor will learn that straight=argmaxFt=ADTt so it will converge to assigning equal expected value to F(straight) andF(ADT), and E(F(straight)) converges to 1.
If it unpacks as “opponent is ADT-epsilon”, and you converge to never swerving, then argmaxing will start duplicating the swerve strategy instead of going straight. In both cases, the argument fails.
I got an improved reality-filter that blocks a certain class of environments that lead conjecture 1 to fail, although it isn’t enough to deal with the provided chicken example and lead to a proof of conjecture 1. (the t subscripts will be suppressed for clarity)
Instead of the reality-filter for E being |E(E(ADT))−E(U)|<ϵ
it is now
This doesn’t just check whether reality is recovered on average, it also checks whether all the “plausible conditionals” line up as well. Some of the conditionals may not be well-formed, as there may be conditioning on low-or-zero probability events, but these are then multiplied by a very small number, so no harm is done.
This has the nice property that for all “plausibly chosen embedders” F that have a probability sufficiently far away from 0, all embedders E and E′ that pass this reality filter have the property that E(E(ADT)|ADT=amF)≃tE(E′(ADT)|ADT=amF)
So all embedders that pass the reality filter will agree on the expected utility of selecting a particular embedder that isn’t very unlikely to be selected.
I figured out what feels slightly off about this solution. For events like “I have a long memory and accidentally dropped a magnet on it”, it intuitively feels like describing your spot in the environment and the rules of your environment is much lower K-complexity than finding a turing machine/environment that starts by giving you the exact (long) scrambled sequence of memories that you have, and then resumes normal operating.
Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing some other simple environment, you would intuitively expect the AIXI to act as if it’s in the simple environment for a brief time before gaining enough information to conclude that things have changed and rederive the new rules of where it is.
Not quite. If taking bet 9 is a prerequisite to taking bet 10, then AIXI won’t take bet 9, but if bet 10 gets offered whether or not bet 9 is accepted, then AIXI will be like “ah, future me will take the bet, and wind up with 10+ϵ in the heads world and −20+2ϵ in the tails world. This is just a given. I’ll take this +15/-15 bet as it has net positive expected value, and the loss in the heads world is more than counterbalanced by the reduction in the magnitude of loss for the tails world”
Something else feels slightly off, but I can’t quite pinpoint it at this point. Still, I guess this solves my question as originally stated, so I’ll PM you for payout. Well done!
(btw, you can highlight a string of text and hit crtl+4 to turn it into math-mode)
Yup, I meant counterfactual mugging. Fixed.
I think I remember the original ADT paper showing up on agent foundations forum before a writeup on logical EDT with exploration, and my impression of which came first was affected by that. Also, the “this is detailed in this post” was referring to logical EDT for exploration. I’ll edit for clarity.