# AidanGoth

Karma: 3
• I thought it dealt with these ok—could you be more specific?

It’s linear because it’s an expectation. It is under-specified in that it needs us to assume or prove the marginal distributions for the and I guess that’s problematic if an algorithm for doing that is a big part of what the authors are looking for. But if we do have marginal distributions for each , then are well-defined and .

• This question is in the spirit of “I think I’m doing something dumb /​ obviously wrong—help me see why” but it’s maybe too niche for this thread. (Answers that redirect me to a better place to ask are welcome.)

I recently read Paul Christiano, Eric Neyman and Mark Xu’s “Formalizing the presumption of independence” (https://​​arxiv.org/​​pdf/​​2211.06738.pdf). My understanding is that they aim to formalise some types of reasonable (but defeasible) “hand-waving” in otherwise formal proofs, in a way that maintains the underlying deductive structure of a formal proof and responds appropriately to new information /​​ arguments. They’re particularly interested in heuristic estimators that presume the independence of random variables so long as we have no reason to think the variables aren’t independent and so long as we can adjust the estimate appropriately if we learn about their dependencies.

To that end, suppose we want to estimate , where is a set of real-valued random variables, , and we have a collection of deductively proved (in)equalities about . Then a natural heuristic estimator could be:

where each has the same marginal distributions as , (i.e. is equal to but with each instance of replaced by ), and where the are conditionally independent given . This formalises the idea that we assume we’ve thought of all the dependencies between the variables of interest and that they’re independent, conditional on everything we’ve thought of so far—but we can revise this estimate by conditioning on new information and dependencies later.

Before considering any information relating the to each other, assumes that they are unconditionally independent. As we condition on information about them, we update the estimate to account for this and maintain that the variables are conditionally independent, given the information considered so far. E.g. in the twin primes example, we can initially assume that and are independent, and then condition on the fact that if is prime, then is odd (this can be operationalised by considering the appropriate indicator function and conditioning on it taking value ) to adjust the estimate and assume (for now) that there are no further dependencies.

We always have . In fact, we always have . If we further have that doesn’t relate and (i.e. doesn’t include a formula containing both and ), then I think we have and , giving (i.e. without the primes).

My suggested heuristic estimator apparently has lots of nice properties thanks to being an expectation, including some of the informal properties listed in the paper, which can be stated formally (e.g. if doesn’t have an instance of any of the , then conditioning on it won’t change the heuristic estimate).

My suggested estimator jumped out to me pretty quickly as capturing (to my understanding) what the authors want, but I’d expect myself to be much worse at this than the authors, who will have spent a while longer thinking about it. So my estimator seems “too good to be true” and I think it’s likely I’m pretty confused or missing something obvious and/​or important. Please help me see what I’m missing! A couple of hypotheses:

• There’s something wrong /​ incoherent about my suggested heuristic estimator

• My suggested heuristic estimator is too general to be useful

• The paper mainly considers very specific special cases with specific algorithms for heuristic estimators rather than something as general as this, which might be difficult to implement in practice

# [Question] Can we learn much by study­ing the be­havi­our of RL poli­cies?

15 May 2023 12:56 UTC
1 point