# Diffractor

Karma: 188

NewTop

The basic reason for the dependency relation to care about oracle queries from strategies is that, when you have several players all calling the oracle on each other, there’s no good way to swap out the oracle calls with the computation. The trick you describe does indeed work, and is a reason to not call any more turing machines than you need to, but there’s several things it doesn’t solve. For instance, if you are player 1, and your strategy depends on oracle calls to player 2 and 3, and the same applies to the other two players, you may be able to swap out an oracle call to player two with player two’s actual code (which calls players 1 and 3), but you can’t unpack any more oracle calls into their respective computations without hitting an infinite regress.

I’m not sure what you mean by fixing the utility function occurring

*before*fixing the strategy. In the problem setup of a game, you specify a utility function machine and a strategy machine for everyone, and there isn’t any sort of time or order on this (there’s just a set of pairs of probabilistic oracle machines) and you can freely consider things such as “what happens when we change some player’s strategies/utility function machines”

# Cooperative Oracles

# VOI is Only Nonnegative When Information is Uncorrelated With Future Action

Ah, the formal statement was something like “if the policy A isn’t the argmax policy, the successor policy B must be in the policy space of the future argmax, and the action selected by policy A is computed so the relevant equality holds”

Yeah, I am assuming fast feedback that it is resolved on day .

What I meant was that the computation isn’t extremely long in the sense of description length, not in the sense of computation time. Also, we aren’t doing policy search over the set of all turing machines, we’re doing policy search over some smaller set of policies that can be guaranteed to halt in a reasonable time (and more can be added as time goes on)

Also I’m less confident in conditional future-trust for all conditionals than I used to be, I’ll try to crystallize where I think it goes wrong.

First: That notation seems helpful. Fairness of the environment isn’t present by default, it still needs to be assumed even if the environment is purely action-determined, as you can consider an agent in the environment that is using a hardwired predictor of what the argmax agent would do. It is just a piece of the environment, and feeding a different sequence of actions into the environment as input gets a different score, so the environment is purely action-determined, but it’s still unfair in the sense that the expected utility of feeding action into the function drops sharply if you condition on the argmax agent selecting action . The third condition was necessary to carry out this step. . The intuitive interpretation of the third condition is that, if you know that policy B selects action 4, then you can step from “action 4 is taken” to “policy B takes the actions it takes”, and if you have a policy where you don’t know what action it takes (third condition is violated), then “policy B does its thing” may have a higher expected utility than any particular action being taken, even in a fair environment that only cares about action sequences, as the hamster dance example shows.

Second: I think you misunderstood what I was claiming. I wasn’t claiming that logical inductors attain the conditional future-trust property, even in the limit, for all sentences or all true sentences. What I was claiming was: The fact that is provable or disprovable in the future (in this case, is ), makes the conditional future-trust property hold (I’m fairly sure), and for statements where there isn’t guaranteed feedback, the conditional future-trust property may fail. The double-expectation property that you state does not work to carry the proof through, because the proof (from the perspective of the first agent), takes as an assumption, so the “conditional on ” part

*has*to be outside of the future expectation, when you go back to what the first agent believes.Third: the sense I meant for “agent is able to reason about this computation” is that the computation is not extremely long, so logical inductor traders can bet on it.

# Probabilistic Tiling (Preliminary Attempt)

Pretty much that, actually. It doesn’t seem

*too*irrational, though. Upon looking at a mathematical universe where torture was decided upon as a good thing, it isn’t an obvious failure of rationality to hope that a cosmic ray flips the sign bit of the utility function of an agent in there.The practical problem with values that care about other mathematical worlds, however, is that if the agent you built has a UDT prior over values, it’s an improvement (from the perspective of the prior) for the nosy neigbors/values that care about other worlds, to dictate some of what happens in your world (since the marginal contribution of your world to the prior expected utility looks like some linear combination of the various utility functions, weighted by how much they care about your world) So, in practice, it’d be a bad idea to build a UDT value learning prior containing utility functions that have preferences over all worlds, since it’d add a bunch of extra junk from different utility functions to our world if run.

If exploration is a hack, then why do pretty much all multi-armed bandit algorithms rely on exploration into suboptimal outcomes to prevent spurious underestimates of the value associated with a lever?

# Conditioning, Counterfactuals, Exploration, and Gears

Since beliefs/values combinations can be ruled out, would it then be possible to learn values by asking the human about their own beliefs?

It doesn’t hurt my brain, but there’s a brain fog that kicks in eventually, that’s kind of like a blankness with no new ideas coming, an aversion to further work, and a reduction in working memory, so I can stare at some piece of math for a while, and not comprehend it, because I can’t load all the concepts into my mind at once. It’s kind of like a hard limit for any cognition-intensive task.

This kicks in around the 2 hour mark for really intensive work/studying, although for less intensive work/studying, it can vary up all the way up to 8 hours. As a general rule of thumb, the -afinil class of drugs triples my time limit until the brain fog kicks in, at a cost of less creative and lateral thinking.

Because of this, my study habits for school consisted of alternating 2-hour study blocks and naps.

# Program Search and Incomplete Understanding

The beliefs aren’t arbitrary, they’re still reasoning according to a probability distribution over propositionally consistent “worlds”. Furthermore, the beliefs converge to a single number in the limit of updating on theorems, even if the sentence of interest is unprovable. Consider some large but finite set S of sentences that haven’t been proved yet, such that the probability of sampling a sentence in that set before sampling the sentence of interest “x”, is very close to 1. Then pick a time N, that is large enough that by that time, all the logical relations between the sentences in S will have been found. Then, with probability very close to 1, either “x” or “notx” will be sampled without going outside of S.

So, if there’s some cool new theorem that shows up relating “x” and some sentence outside of S, like “y->x”, well, you’re almost certain to hit either “x” or “notx” before hitting “y”, because “y” is outside S, so this hot new theorem won’t affect the probabilities by more than a negligible amount.

Also I figured out how to generalize the prior a bit to take into account arbitrary constraints other than propositional consistency, though there’s still kinks to iron out in that one. Check this.

Yup, that particular book is how I learned to prove stuff too. (well, actually, there was a substantial time delay between reading that and being able to prove stuff, but it’s an extremely worthwhile overview)

You’re pretending that it’s what nature is doing what you update your prior. It works when sentences are shown to you in an adversarial order, but there’s the weird aspect that this prior expects the sentences to go back to being drawn from some fixed distribution afterwards. It doesn’t do a thing where it goes “ah, I’m seeing a bunch of blue blocks selectively revealed, even though I think there’s a bunch of red blocks, the next block I’ll have revealed will probably be blue”. Instead, it just sticks with its prior on red and blue blocks.

There’s a misconception, it isn’t about finding sentences of the form and , because if you do that, it immediately disproves . It’s actually about merely finding many instances of where has probability, and this lowers the probability of . This is

*kind*of like how finding out about the Banach-Tarski paradox (something you assign low probability to) may lower your degree of belief in the axiom of choice.The particular thing that prevents trolling is that in this distribution, there’s a fixed probability of drawing on the next round no matter how many implications and ’s you’ve found so far. So the way it evades trolling is a bit cheaty, in a certain sense, because it believes that the sequence of truth or falsity of math sentences that it sees is drawn from a certain fixed distribution, and doesn’t do anything like believing that it’s more likely to see a certain class of sentences come up soon.

There’s a difference between “consistency” (it is impossible to derive X and notX for any sentence X, this requires a halting oracle to test, because there’s always more proof paths), and “propositional consistency”, which merely requires that there are no contradictions discoverable by boolean algebra only. So A^B is propositionally inconsistent with notA, and propositionally consistent with A. If there’s some clever way to prove that B implies notA, it wouldn’t affect the propositional consistency of them at all. Propositional consistency of a set of sentences can be verified in exponential time.

I read through the entire Logical Induction paper, most-everything on Agent Foundations Forum, the advised Linear Algebra textbook, part of a Computational Complexity textbook, and the Optimal Poly-Time Estimators paper.

I’d be extremely interested in helping out other people with learning MIRI-relevant math, having gone through it solo. I set up a Discord chatroom for it, but it’s been pretty quiet. I’ll PM you both.

Giles Edkins coded up a thing which lets you plug in numbers for a 2-player, 2-move game payoff matrix and it automatically displays possible outcomes in utility-space. It may be found here. The equilibrium points and strategy lines were added later in MS Paint.