Scott Garrabrant comments on Finite Factored Sets

Scott Garrabrant 24 May 2021 15:38 UTC
LW: 4 AF: 3
AF
Ok, makes sense. I think you are just pointing out that when I am saying “general position,” that is relative to a given structure, like FFS or DAG or symmetric FFS.
If you have a probability distribution, it might be well modeled by a DAG, or a weaker condition is that it is well modeled by a FFS, or an even weaker condition is that it is well modeled by a SFFS (symmetric finite factored set).
We have a version of the fundamental theorem for DAGs and d-seperation, we have a version of the fundamental theorem for FFS and conditional orthogonality, and we might get a version of the fundamental theorem for SFFS and whatever corresponds to conditional independence in that world.
However, I claim that even if we can extend to a fundamental theorem for SFFS, I still want to think of the independences in a SFFS as having different sources. There are the independences coming from orthogonality, and there are there the independences coming from symmetry (or symmetry together with orthogonality.
In this world, orthogonality won’t be as inferable because it will only be a subset of independence, but it will still be an important concept. This is similar to what I think will happen when we go to the infinite dimensional factored sets case.
- cousin_it 24 May 2021 16:03 UTC
  LW: 6 AF: 2
  AF Parent
  Can you give some more examples to motivate your method? Like the smoking/tar/cancer example for Pearl’s causality, or Newcomb’s problem and counterfactual mugging for UDT.
  - Scott Garrabrant 24 May 2021 16:35 UTC
    LW: 13 AF: 6
    AF Parent
    Hmm, first I want to point out that the talk here sort of has natural boundaries around inference, but I also want to work in a larger frame that uses FFS for stuff other than inference.
    If I focus on the inference question, one of the natural questions that I answer is where I talk about grue/bleen in the talk.
    I think for inference, it makes the most sense to think about FFS relative to Pearl. We have this problem with looking at smoking/tar/cancer, which is what if we carved into variables the wrong way. What if instead of tar/cancer, we had a variable for “How much bad stuff is in your body?” and “What is the ratio of tar to cancer?” The point is that our choice of variables both matters for the Pearlian framework, and is not empirically observable. I am trying to do all the good stuff in Pearl without the dependence on the variables
    Indeed, I think the largest crux between FFS and Pearl is something about variable realism. To FFS, there is no realism to a variable beyond its information content, so it doesn’t make sense to have two variables X, X’ with the same information, but different temporal properties. Pearl’s ontology, on the other hand, has these graphs with variables and edges that say “who listens to whom,” which sets us up to be able to have e.g. a copy function from X to X’, and an arrow from X to Y, which makes us want to say X is before Y, but X’ is not.
    For the more general uses of FFS, which are not about inference, my answer is something like “the same kind of stuff as Cartesian frames.” e.g. specifying embedded observations. (A partition $A$ observes a subset $E$ relative to a high level world model $W$ if $A ⊥ {E, S ∖ E}$ and $A ⊥ W | (S ∖ E)$ . Notice the first condition is violated by transparent Newcomb, and the second condition is violated by counterfactual mugging. (The symbols here should be read as combinatorial properties, there are no probabilities involved.))
    I want to be able to tell the stories like in the Saving Time post, where there are abstract versions of things that are temporally related.
    - Scott Garrabrant 24 May 2021 17:47 UTC
      LW: 2 AF: 2
      AF Parent
      Here is a more concrete example of me using FFS the way I intend them to be used outside of the inference problem. (This is one specific application, but maybe it shows how I intend the concepts to be manipulated).
      I can give an example of embedded observation maybe, but it will have to come after a more formal definition of observation (This is observation of a variable, rather than the observation of an event above):
      Definition: Given a FFS $F = (S, B)$ , and $A$ , $W$ , $X$ , which are partitions of $S$ , where $X = {x_{1}, \dots, x_{n}}$ , we say $A$ observes $X$ relative to W if:
      1) $A ⊥ X$ ,
      2) $A$ can be expressed in the form $A = A_{0} \lor_{S} \dots \lor_{S} A_{n}$ , and
      3) $A_{i} ⊥ W | (S ∖ x_{i})$ .
      (This should all be interpreted combinatorially, not probabilistically.)
      The intuition of what is going on here is that to observe an event, you are being promised that you 1) do not change whether the event holds, and 3) do not change anything that matters in the case where that event does not hold. Then, to observe a variable, you can basically 2) split yourself up into different fragments of your policy, where each policy fragment observes a different value of that variable. (This whole thing is very updateless.)
      Example 1: (non-observation)
      An agent $A = {L, R}$ does not observe a coinflip $X = {H, T}$ , and chooses to raise either his left or right hand. Our FFS $F = (S, B)$ is given by $S = A \times X$ , and $B = {A, X}$ . (I am abusing notation here slightly by conflating $A$ with the partition you get on $A \times X$ by projecting onto the $A$ coordinate.) Then W is the discrete partition on $A \times X$ .
      In this example, we do not have observation. Proof: A only has two parts, so if we express A as a common refinement of 2 partitions, at least one of these two partitions must be equal to A. However, A is not orthogonal to W given H and A is not orthogonal to W given T. ( $h^{F} (A | H) = h^{F} (W | H) = h^{F} (A | T) = h^{F} (W | T) = {A}$ ). Thus we must violate condition 3.
      Example 2: (observation)
      An agent $A = {L L, L R, R L, R R}$ does observe a coinflip $X = {H, T}$ , and chooses to raise either his left or right hand. We can think of $A$ as actually choosing a policy that is a function from $X$ to ${L, R}$ , where the two character string in the parts in $A$ are the result of H followed by the result of T.
      Our FFS $F = (S, B)$ is given by $S = X \times A_{H} \times A_{T}$ , and $B = {X, A_{H}, A_{T}}$ , where $A_{H} = {L_{H}, R_{H}}$ represents what the agent would do seeing heads, and $A_{T} = {L_{T}, R_{T}}$ represents what the agent word do given seeing tails. $A = A_{H} \lor_{S} A_{T}$ . We also have a partition representing what the agent actually does $Y = {L, R}$ , where $L$ and $R$ are each four element sets in the obvious way. We will then say $W = X \lor_{S} Y$ , so W does not get to see what $A$ would have done, it only gets to see the coin flip and what $A$ actually did.
      Now I will prove that $A$ observes $X$ relative to $W$ in this example. First, $h^{F} (A) = {A_{H}, A_{T}}$ , and $h^{F} (X) = {X}$ , so we get the first condition, $A ⊥ X$ . We will break up A in the obvious way set up in the problem for condition 2, so it suffices now to show that $A_{H} ⊥ W | T$ , (and it will follow symmetrically that $A_{T} ⊥ W | H$ .)
      Im not going to go through the details, but $h^{F} (A_{H} | T) = {A_{H}}$ , while $h^{F} (W | T) = {A_{T}}$ , which are disjoint. The important thing here is that $W$ doesn’t care about $A_{H}$ in worlds in which $T$ holds.
      Discussion:
      So largely I am sharing this to give an example for how you can manipulate FFS combinatorially, and how you can use this to say things that you might otherwise want to say using graphs, Granted, you could also say the above things using graphs, but now you can say more things, because you are not restricted to the nodes you choose, you can ask the same combinatorial question about any of the other partitions, The benefit is largely about not being dependent on our choice of variables.
      It is interesting to try to translate this definition of observation to transparent Newcomb or counterfactual mugging, and see how some of the orthogonalities are violated, and thus it does not count as an observation.
    - [ ]
      [deleted]
      - Scott Garrabrant 24 May 2021 17:45 UTC
        LW: 2 AF: 2
        AF Parent
        I’ll try. My way of thinking doesn’t use the examples, so I have to generate them for communication.
        I can give an example of embedded observation maybe, but it will have to come after a more formal definition of observation (This is observation of a variable, rather than the observation of an event above):
        Definition: Given a FFS $F = (S, B)$ , and $A$ , $W$ , $X$ , which are partitions of $S$ , where $X = {x_{1}, \dots, x_{n}}$ , we say $A$ observes $X$ relative to W if:
        1) $A ⊥ X$ ,
        2) $A$ can be expressed in the form $A = A_{0} \lor_{S} \dots \lor_{S} A_{n}$ , and
        3) $A_{i} ⊥ W | (S ∖ x_{i})$ .
        (This should all be interpreted combinatorially, not probabilistically.)
        The intuition of what is going on here is that to observe an event, you are being promised that you 1) do not change whether the event holds, and 3) do not change anything that matters in the case where that event does not hold. Then, to observe a variable, you can basically 2) split yourself up into different fragments of your policy, where each policy fragment observes a different value of that variable. (This whole thing is very updateless.)
        Example 1 (non-observation)
        An agent $A = {L, R}$ does not observe a coinflip $X = {H, T}$ , and chooses to raise either his left or right hand. Our FFS $F = (S, B)$ is given by $S = A \times X$ , and $B = {A, X}$ . (I am abusing notation here slightly by conflating $A$ with the partition you get on $A \times X$ by projecting onto the $A$ coordinate.) Then W is the discrete partition on $A \times X$ .
        In this example, we do not have observation. Proof: A only has two parts, so if we express A as a common refinement of 2 partitions, at least one of these two partitions must be equal to A. However, A is not orthogonal to W given H and A is not orthogonal to W given T. ( $h^{F} (A | H) = h^{F} (W | H) = h^{F} (A | T) = h^{F} (W | T) = {A}$ ). Thus we must violate condition 3.
        Example 2: (observation)
        An agent $A = {L L, L R, R L, R R}$ does observe a coinflip $X = {H, T}$ , and chooses to raise either his left or right hand. We can think of $A$ as actually choosing a policy that is a function from $X$ to ${L, R}$ , where the two character string in the parts in $A$ are the result of H followed by the result of T.
        Our FFS $F = (S, B)$ is given by $S = X \times A_{H} \times A_{T}$ , and $B = {X, A_{H}, A_{T}}$ , where $A_{H} = {L_{H}, R_{H}}$ represents what the agent would do seeing heads, and $A_{T} = {L_{T}, R_{T}}$ represents what the agent word do given seeing tails. $A = A_{H} \lor_{S} A_{T}$ . We also have a partition representing what the agent actually does $Y = {L, R}$ , where $L$ and $R$ are each four element sets in the obvious way. We will then say $W = X \lor_{S} Y$ , so W does not get to see what $A$ would have done, it only gets to see the coin flip and what $A$ actually did.
        Now I will prove that $A$ observes $X$ relative to $W$ in this example. First, $h^{F} (A) = {A_{H}, A_{T}}$ , and $h^{F} (X) = {X}$ , so we get the first condition, $A ⊥ X$ . We will break up A in the obvious way set up in the problem for condition 2, so it suffices now to show that $A_{H} ⊥ W | T$ , (and it will follow symmetrically that $A_{T} ⊥ W | H$ .)
        Im not going to go through the details, but $h^{F} (A_{H} | T) = {A_{H}}$ , while $h^{F} (W | T) = {A_{T}}$ , which are disjoint. The important thing here is that $W$ doesn’t care about $A_{H}$ in worlds in which $T$ holds.
        Discussion:
        So largely I am sharing this to give an example for how you can manipulate FFS combinatorially, and how you can use this to say things that you might otherwise want to say using graphs, Granted, you could also say the above things using graphs, but now you can say more things, because you are not restricted to the nodes you choose, you can ask the same combinatorial question about any of the other partitions, The benefit is largely about not being dependent on our choice of variables.
        It is interesting to try to translate this definition of observation to transparent Newcomb or counterfactual mugging, and see how some of the orthogonalities are violated, and thus it does not count as an observation.