# SamEisenstat

Karma: 36 (LW), 2 (AF)

NewTop

# Reflective oracles as a solution to the converse Lawvere problem

I misunderstood your proposal, but you don’t need to do this work to get what you want. You can just take each sentence as an axiom, but declare that this axiom takes symbols to invoke. This could be done by changing the notion of length of a proof, or by taking axioms and with very long.

Yeah, I had something along the lines of what Paul said in mind. I wanted not to require that the circuit implement exactly a given function, so that we could see if daemons show up in the output. It seems easier to define daemons if we can just look at input-output behaviour.

I’m having trouble thinking about what it would mean for a circuit to contain daemons such that we could hope for a proof. It would be nice if we could find a simple such definition, but it seems hard to make this intuition precise.

For example, we might say that a circuit contains daemons if it displays more optimization that necessary to solve a problem. Minimal circuits could have daemons under this definition though. Suppose that some function describes the behaviour of some powerful agent, a function is like with noise added, and our problem is to predict sufficiently well the function . Then, the simplest circuit that does well won’t bother to memorize a bunch of noise, so it will pursue the goals of the agent described by more efficiently than , and thus more efficiently than necessary.

Two minor comments. First, the bitstrings that you use do not all correspond to worlds, since, for example, implies , as is a subtheory of . This can be fixed by instead using a tree of sentences that all diagonalize against themselves. Tsvi and I used a construction in this spirit in A limit-computable, self-reflective distribution, for example.

Second, I believe that weakening #2 in this post also cannot be satisfied by any constant distribution. To sketch my reasoning, a trader can try to buy a sequence of sentences , spending $$2^{-n}$ on the \(n\)th sentence \(\phi_1 \wedge \dots \wedge \phi_n\). It should choose the sequence of sentences so that \(\phi_1 \wedge \dots \wedge \phi_n\) has probability at most \(2^{-n}\), and then it will make an infinite amount of money if the sentences are simultaneously true.

The way to do this is to choose each from a list of all sentences. If at any point you notice that has too high a probability, then pick a new sentence for . We can sell all the conjunctions for and get back the original amount payed by hypothesis. Then, if we can keep using sharper continuous tests of the probabilities of the sentences over time, we will settle down to a sequence with the desired property.

In order to turn this sketch into a proof, we need to be more careful about how these things are to be made continuous, but it seems routine.

I at first didn’t understand your argument for claim (2), so I wrote an alternate proof that’s a bit more obvious/careful. I now see why it works, but I’ll give my version below for anyone interested. In any case, what you really mean is the probability of deciding a sentence outside of

*by having it announced by nature*; there may be a high probability of sentences being decided indirectly via sentences in .Instead of choosing as you describe, pick so that the probability of sampling something in is greater than . Then, the probability of sampling something in is at least . Hence, no matter what sentences have been decided already, the probability that repeatedly sampling from selects before it selects any sentence outside of is at least

as desired.

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering!

Oh, I didn’t notice that aspect of things. That’s pretty cool.

A few thoughts:

I agree that the LI criterion is “pointwise” in the way that you describe, but I think that this is both pretty good and as much as could actually be asked. A single efficiently computable trader can do a lot. It can enforce coherence on a polynomially growing set of sentences, search for proofs using many different proof strategies, enforce a polynomially growing set of statistical patterns, enforce reflection properties on a polynomially large set of sentences, etc. So, eventually the market will not be exploitable on all these things simultaneously, which seems like a pretty good level of accurate beliefs to have.

On the other side of things, it would be far to strong to ask for a uniform bound of the form “for every , there is some day such that after step , no trader can multiply its wealth by a factor more than “. This is because a trader can be hardcoded with arbitrarily many hard-to-compute facts. For every , there must eventually be a day on which the belief of your logical inductor assign probability less than to some true statement, at which point a trader who has that statement hardcoded can multiply its wealth by . (I can give a construction of such a sentence using self-reference if you want, but it’s also intuitively natural—just pick many mutually exclusive statements with nothing to break the symmetry.)

Thus, I wouldn’t think of traders as “mistakes”, as you do in the post. A trader can gain money on the market if the market doesn’t already know all facts that will be listed by the deductive process, but that is a very high bar. Doing well against finitely many traders is already “pretty good”.

What you can ask for regarding uniformity is for some simple function such that any trader can multiply its wealth by at most a factor . This is basically the idea of the mistake bound model in learning theory; you bound how many mistakes happen rather than when they happen. This would let you say a more than the one-trader properties I mentioned in my first paragraph. In fact, has this propery; is just the initial wealth of the trader. You may therefore want to do something like setting traders’ initial wealths according to some measure of complexity. Admittedly this isn’t made explicit in the paper, but there’s not much additional that needs to be done to think in this way; it’s just the combination of the individual proofs in the paper with the explicit bounds you get from the initial wealths of the traders involved.

I basically agree completely on your last few points. The traders are a model class, not an ensemble method in any substantive way, and it is just confusing to connect them to the papers on ensemble methods that the LI paper references. Also, while I use the idea of logical induction to do research that I hope will be relevant to practical algorithms, it seems unlikely than any practical algorithm will look much like a LI. For one thing, finding fixed points is really hard without some property stronger than continuity, and you’d need a pretty good reason to put it in the inner loop of anything.

Universal Prediction of Selected Bits solves the related question of what happens if the odd bits are adversarial but the even bits copy the preceding odd bits. Basically, the universal semimeasure learns to do the right thing, but the exact sense in which the result is positive is subtle and has to do with the difference between measures and semimeasures. The methods may also be relevant to the questions here, though I don’t see a proof for either question yet.

Yeah, I like tail dependence.

There’s this question of whether for logical uncertainty we should think of it more as trying to “un-update” from a more logically informed perspective rather than trying to use some logical prior that exists at the beginning of time. Maybe you’ve heard such ideas from Scott? I’m not sure if that’s the right perspective, but it’s what I’m alluding to when I say you’re introducing artificial logical uncertainty.

Yeah, the 5 and 10 problem in the post actually can be addressed using provability ideas, in a way that fits in pretty natually with logical induction. The motivation here is to work with decision problems where you can’t prove statements for agent , utility function , action , and utility value , at least not with the amount of computing power provided, but you want to use inductive generalizations instead. That isn’t necessary in this example, so it’s more of an illustration.

To say a bit more, if you make logical inductors propositionally consistent, similarly to what is done in this post, and make them assign things that have been proven already probability 1, then they will work on the 5 and 10 problem in the post.

It would be interesting if there was more of an analogy to explore between the provability oracle setting and the inductive setting, and more ideas could be carried over from modal UDT, but it seems to me that this is a different kind of problem that will require new ideas.

It’s hard to analyze the dynamics of logical inductors too precisely, so we often have to do things that feel like worst-case analysis, like considering an adversarial trader with sufficient wealth. I think that problems that show up from this sort of analysis can be expected to correspond to real problems in superintelligent agents, but that is a difficult question. The malignancy of the universal prior is part of the reason.

As to your proposed solution, I don’t see how it would work. Scott is proposing that the trader makes conditional contracts, which are in effect voided if the event that they are conditional on doesn’t happen, so the trader doesn’t actually lose anything is this case. (It isn’t discussed in this post, but conditional contracts can be built out of the usual sort of bets, with payoffs given by the definition of conditional probability.) So, in order to make the trader lose money, the events need to happen sometimes, not just be expect to happen with some nonnegligable probability by the market.

In counterfactual mugging with a logical coin, AsDT always uses a logical inductor’s best-estimate of the utility it would get right now, so it sees the coin as already determined, and sees no benefit from giving Omega money in the cases where Omega asks for money.

The way I would think about what’s going on is that if the coin is already known at the time that the expectations are evaluated, then the problem isn’t convergent in the sense of AsDT. The agent that pays up whenever asked has a constant action, but it doesn’t receive a constant expected utility. You can think of the averaging as introducing artificial logical uncertainty to make more things convergent, which is why it’s more updateless. (My understanding is that this is pretty close to how you’re thinking of it already.)

This isn’t too related to your main point, but every ordered field can be embedded into a field of Hahn series, which might be simpler to work with than surreals.

That page discusses the basics of Hahn series, but not the embedding theorem. (Ehrlich, 1995) treats things in detail, but is long and introduces a lot of definitions. The embedding theorem is stated on page 23 (24 in the pdf).

# Logical inductor limits are dense under pointwise convergence

Nicely done. I should have spent more time thinking about liar sentences; you really can do a lot with them.

Follow-up question—is the set of limits of logical inductors convex? (Your proof also makes me curious as to whether the set of “expanded limits” is convex, though that question is a bit less nice than the other).

I’m copying over some (lightly edited) conversation about this post from Facebook.

**Sam Eisenstat**: In the original counterexample to the Trolljecture, I guess you’re proposing that be logically prior to , so we can go back and replace with , but still have to use in order to derive ? Here I’m just using “” to mean “counterfactually results in” without meaning to imply that this should be defined modally.I agree that this is intuitively sensible, but it is difficult to give a general definition that agrees with intuition—it’s hard to even have an intuition in the general case. There has been some work along these lines though; see for example https://agentfoundations.org/item?id=241 though you may already know about this since you’ve been talking with Scott. I’m pessimistic about this direction in general, though I’d be happy to talk further about it.

**Jack Gallagher**: What’s the generator behind your being pessimistic?**Sam Eisenstat**: Well, like Daniel suggests, we can ask whether the ordering is subjective/based on an agents epistemic state. Unlike Daniel, I think there are problems with this. The best examples are things like transparent Newcomb’s problem, where the agent needs to counterfact on things that it already knows the fact of the matter about. You can try to find loopholes in this, e.g. maybe the agent is doing something wrong if it isn’t ignorant of certain facts, but I haven’t been able to find something that’s satisfying (e.g. doesn’t fall to other thought experiments).It thus seems that an agent should just counterfact on things even though it already knows them to be true, but this is a weird sort of thing to call “epistemic”. It doesn’t just depend on the agent’s probability assignments to various statements. Thus, to go farther here would seem to require a richer idea of “agent” than we presently have.

The alternative is an objective temporal ordering on logical statements. (This is a terrible use of the excluded middle, since “subjective” and “objective” are smuggling in a lot, so the place to look is probably somewhere that feels like a third option.) Anyway, it seems absurd for there to be a fact of the matter as to whether 417*596 = 248532 comes temporally before or after 251*622 = 156122.

Part of this is just a feeling of arbitrariness because even if one of those did come before the other, we don’t have to tools to know which one. Still, it seems the tools we do have are the wrong shape for building something that would work. For example, there seems to be an affinity between the ideas of being “easy”, of depending on few things, but being depended on by many, and of being temporally “early”. Patrick’s revived trolljecture is one attempt at formalizing it, but there was nothing deep about my counterexample, it’s just easy to generate statements that are difficult to evaluate. (There are a different versions of this, like independent statements in undecidable theories, or hard SAT instances in propositional logic.)

I used the word “we” in the last paragraph, but maybe you have type-theory tools that are more suited to this?

**Daniel Satanove**: It’s easier to come up with UDT when you have CDT. Is there a version of logical counterfactuals that works in more intuitive cases, but fail on stranger edge cases like transparent Newcomb’s?**Sam Eisenstat**: The EDT of logical counterfactuals is just EDT with a prior over logic (e.g. bounded Demski, something more sophisticated). Proof-based UDT is a special case of this; it prescribes actions if any reasonable prior would agree, and is silent otherwise.Unfortunately, as shown by the trolljecture counterexample, there are counterfactuals that proof-based UDT gets wrong (the trolljecture basically says that what proof-based UDT does should be read as counterfactuals), and therefore that EDT gets wrong given any reasonable prior, so it’s hard to define a “domain of validity” for this (not that that’s necessary for it be informative).

One candidate for the CDT of logical counterfactuals would just be putting a causal DAG structure on logical statements, like discussed in Scott’s forum post that I linked above. The TDT paper is informal, and I haven’t looked at it in a while, but I think that you can read it as proposing exactly this.

I believe that in all thought experiments where people have an intuition about what should be done, they are using this sort of reasoning plus strategy selection (i.e. updatelessness). I would be interested in seeing any thought experiments that people do not analyze in this way.

There isn’t much in the direction of actually defining such causal networks. Back when the trolljecture looked plausible we tried to interpret it in terms of causal networks, which didn’t quite work (see http://epsilonofdoom.blogspot.com/2015/08/provability-counterfactuals-vs-three.html). I expect something like Patrick’s revived trolljecture to be even harder to view in this way, since it’s inherently approximate; a notion of approximate causal network would be more appropriate here, if one could be found. I don’t know of any proposal that (1) actually defines a causal network and (2) works on some easy examples.

Well, that’s not entirely true. You can just pick any ordering that puts the agents decision at the beginning, and use the resulting fully-connected DAG. This works wherever proof-based UDT works, but fails on the trolljecture, exactly because it assumes anything that provably follows from your agent’s action is causally downstream of it.

**Tsvi BT**: “One candidate for the CDT of logical counterfactuals would just be putting a causal DAG structure on logical statements”How does this interact with logical updatelessness? What happens to your DAG as you learn new facts?

**Sam Eisenstat**: I’m not sure if it’s obvious, but the idea is that each node has a full table of counterfactuals. An instance of this has more data than just an ordering.It’s not intended to be updateless; it’s a CDT/TDT-style theory rather than UDT-style. You could layer strategy selection on top of it (though it could be computationally difficult if your DAG is hard to work with).

When you learn new facts you condition the probability distribution associated with the DAG on them, like for any Bayesian network.

**Tsvi BT**: So then this is dynamically inconsistent in the counterfactual mugging with a logical coin.**Sam Eisenstat**: Oh yeah, “layer strategy selection on top” doesn’t work for logical uncertainty. Anyway, this was in response to Daniel’s request for a theory that “works in more intuitive cases”; I hope I didn’t give the impression that this would be hard to break.On the other hand, you could try to “just do strategy selection” with respect to a particular prior if you’re willing to treat everything you learn as an observation. This is easy if you pick a coherent prior, but it’s uncomputable and it doesn’t let you pay counterfactual muggers with logical coins. Abram’s GLS stuff tries to do this with priors that don’t already know all logical facts. What do you think of that sort of approach?

**Tsvi BT**: I don’t have much more concrete to say. I currently view decision theory as being the questions of how to do logical updatelessness, and how to do strategy selection with a bounded reasoner. The GLS prior might be close, but there are basic open questions about how it works. “Strategic uncertainty” (or something) may or may not just boil down to logical uncertainty; the reason it might not is that you might actually have to define an optimization target other than “what Ideal Me thinks is the best strategy”, to take into account that actual versions of you are bounded in different ways. These two “components” (I wildly speculate) could combine to give a reflectively stable decision theory.

**Daniel Satanove**: Proof lengths probably aren’t the only way of doing the time step thing. Take the proof length definition of observation at time k, except delay all proofs of statements of the form a + b = c by one time step. This, or something sort of like this, will probably also work. Also any strictly increasing function on observation time should work.The observation at time k should be an epistemic thing that describes how an agent learns of new theorems, rather than some fundamental property of math.

**Jack Gallagher**: What do you mean by “an epistemic thing”?**Daniel Satanove**: “In order to break the cycles, we want some notion of logical time from which influence can only propagate forward. Proof length seems, at least to me, to be the most (only?) natural way to pull something like that off. A proof of length k can be thought of as a logical observation at time k.”You are restricting yourself to agents who do a breadth first search through proofs rather than anything that can do a more directed approach. Saying that you want some notion of “logical time” kind of sounds like you want something that is a property of the math rather than a property of the agent.

[EDIT: This all deals with measures, not semimeasures; see below.]

For to dominate in the old sense means that its Bayes score can only be some constant worse than , no worse, regardless of environment. My definition here implies that, but I think it’s a stricter requirement.

Your definition is equivalent to the standard definition. Li and Vitányi say that dominates iff there is some such that for any binary string , we have . Li and Vitányi’s “probability distributions” take a binary string as input while the probability distributions we are using ask for an element of a -algebra, but we can abuse notation by allowing a binary string to represent the event of that the random sequence starts with this string as a prefix.

**Theorem.**For any probability distributions on Cantor space and any , the condition holds for all binary strings iff it holds for all .**Proof.**The direction is immediate. For , suppose that for all binary string events . We first show that this property holds for every event in the algebra generated by the binary string events. We can write any as a disjoint union of binary string events . Then, as desired.We can extend this to the -algebra generated by , which is just the Borel -algebra on Cantor space, by the monotone class theorem. I’m not sure how much measure theory you know, but you can think of this as a form of induction on measurable sets; if a property holds on an algebra of sets and it is preserved by countable monotone unions and intersections, then it holds on every set in the -algebra generated by . For countable monotone unions, if we have , then We can do the same thing for countable monotone intersections ;

EDIT: I’m talking about probability distributions rather than semimeasures. I don’t know how your definition of dominance is helpful for semimeasures though. The universal semimeasure is defined on binary sequences, and I think that question of whether dominates depends on how you extend it to the whole -algebra. I expect that many reasonable extensions of would satisfy your Theorem 1, but this post (in particular the proof of Theorem 1) doesn’t seem to choose a specific such extension.

Haskell doesn’t actually let you do this as far as I can tell, but the natural computational way to implement a function is with a case expression with no cases. This is sensible because has no constructors, so it should be possible to pattern match it with such an expression. Another way of thinking of this is that we can use pattern matching on sum types and is just the -ary sum type.

Condition 4 in your theorem coincides with Lewis’ account of counterfactuals. Pearl cites Lewis, but he also criticizes him on the ground that the ordering on worlds is too arbitrary. In the language of this post, he is saying that condition 2 arises naturally from the structure of the problem and that condition 4 is derives from the deeper structure corresponding to condition 2.

I also noticed that the function and the partial order can be read as “time of first divergence from the real world” and “first diverges before”, respectively. This makes the theorem a lot more intuitive.

As you say, this isn’t a proof, but it wouldn’t be too surprising if this were consistent. There is some k∈N such that □nϕ→ϕ has a proof of length nk by a result of Pavel Pudlák (On the length of proofs of finitistic consistency statements in first order theories). Here I’m making the dependence on n explicit, but not the dependence on ϕ. I haven’t looked at it closely, but the proof strategy in Theorems 5.4 and 5.5 suggests that k will not depend on ϕ, as long as we only ask for the weaker property that □nϕ→ϕ will only be provable in length nk for sentences ϕ of length at most k.