Phylactery Decision Theory

This is definitely a hack, but it seems to solve many problems around Cartesian Boundaries. Much of this is development of earlier ideas about the Predict-O-Matic, see there if something is unclear.

Phylactery Decision Theory takes a Base Decision Theory (BDT) as an input and builds something around it, creating a new modified decision theory. Its purpose it to give its base the ability to “”“learn””” its position in the world.

I’ll start by explaining a model of it in a Cartesian context. Lets say we have an agent, with a set of designated input and output channels. Then it makes its “decisions” like this: First, it has a probability distribution over everything, including the values of the output channels in the future, and updates it based on the input. There is then an automated mechanism which assigns the output bits, and it gives each output with exactly the probability that the agent assigned to it. The agents prior includes something like the following: “The output bits will be like BDT(MyProbabilityDistribution, MyUtilityFunction, PossibleOutputs)” Then it is easy to see that this belief is stable: Since the agent believes it, the automatism will set the output bits to be that way, and then the agent will observe it, and notice that its beliefs were right. So far this is just a more complicated way to make a BDT agent. Its like a daemon inside an oracle, but on purpose.

Descending to Possibility

Now consider a case where we don’t know the set of output channels ahead of time. Instead start with a set of things OP that we think might be output channels (which, if we are very uncertain that can just mean all events). Then the agents prior will consist of many versions of the one previously suggested, one for each subset of OP. We will start by assigning $1 - ϵ$ probability that all potential outputs are like BDT(MyProbabilityDistribution, MyUtilityFunction, OP). Then we assign $1 / | O P | (1 - ϵ) ϵ$ probability to each theory that all but one potential output are like BDT(MyProbabilityDistribution, MyUtilityFunction, OP\{O_i}). Similarly the hypothesis that all but two are according to BDT gets probabilities on the order of $(1 - ϵ) ϵ^{2}$ , and so on. Then for $ϵ$ sufficiently close to 0, the agent will eventually learn the “right” set of outputs, because for the others it sees that it can’t controll them.

For the input channels, we can’t entirely do away with labeling them, since our probability distribution needs to update on them, but we can try to avoid assuming that they are all the influence the world has on our internals. This starts with a simplification: instead of thinking explicitly that the output depends on our probability distribution, we start out giving $1 - ϵ$ probability to the ouputs being like BDT(Truth, MyUtilityFunction, PossibleOutputs). Of course, the agent will still have to evaluate Truth with its own beliefs, but it now allows formally for the possibility of depending on facts whether or not they were known at the time, and evaluated later, which we need if we want to find out what we can know. Then we give $(1 - ϵ) ϵ$ to something thats very close to that, and so on. And then the agent converges on the closest (by the metric used to construct the descent) thing to maximizing utility in fact that it can do with the information it really has.

Interpretation

So its not quite as hacky as it seems. If we combine learning both the input and output channels, we would have as our first hypothesis that we can control everything, and it will be exactly as we want it to. So, one way to describe what that program does to EDT is like that:

Start out believing in the best possible world
When your theory is inconsistent with the evidence, throw it out and believe in the next best possible world instead
Iterate

And this will converge on the best possible self-fulfilling prophecy. So far, that sounds sane. The problem is that it doesn’t explicitly do that. It doesn’t have a concept of “self-fulfilling prophecy”. It doesn’t even seem to know that its beliefs have any effect on the world—it processes evidence of that but doesn’t represent it. And I don’t know how to make a program that does it explicitly. So theres good reason to think that this will not be a robust solution—but I can’t yet see how, and it may contain fragments of an actual solution.