Rohin Shah comments on The Lightcone Theorem: A Better Foundation For Natural Abstraction?

Rohin Shah 15 May 2023 5:45 UTC
LW: 6 AF: 5
0
AF
Okay, that mostly makes sense.
note that the resampler itself throws away a ton of information about $X^{0}$ while going from $X^{0}$ to $X^{T}$ . And that is indeed information which “could have” been relevant, but almost always gets wiped out by noise. That’s the information we’re looking to throw away, for abstraction purposes.
I agree this is true, but why does the Lightcone theorem matter for it?
It is also a theorem that a Gibbs resampler initialized at equilibrium will produce $X^{T}$ distributed according to $X$ , and as you say it’s clear that the resampler throws away a ton of information about $X^{0}$ in computing it. Why not use that theorem as the basis for identifying the information to throw away? In other words, why not throw away information from $X^{0}$ while maintaining $X^{T} \sim X$ ?
EDIT: Actually, conditioned on $X^{0}$ , it is not the case that $X^{T}$ is distributed according to $X$ .
(Simple counterexample: Take a graphical model where node A can be 0 or 1 with equal probability, and A causes B through a chain of > 2T steps, such that we always have B = A for a true sample from X. In such a setting, for a true sample from X, B should be equally likely to be 0 or 1, but $B^{T} ∣ X^{0} = B^{0}$ , i.e. it is deterministic.)
Of course, this is a problem for both my proposal and for the Lightcone theorem—in either case you can’t view $X^{0}$ as a latent that generates $X$ (which seems to be the main motivation, though I’m still not quite sure why that’s the motivation).
- johnswentworth 15 May 2023 16:18 UTC
  LW: 4 AF: 4
  0
  AF Parent
  Sounds like we need to unpack what “viewing $X^{0}$ as a latent which generates $X$ ” is supposed to mean.
  I start with a distribution $P [X]$ . Let’s say $X$ is a bunch of rolls of a biased die, of unknown bias. But I don’t know that’s what $X$ is; I just have the joint distribution of all these die-rolls. What I want to do is look at that distribution and somehow “recover” the underlying latent variable (bias of the die) and factorization, i.e. notice that I can write the distribution as $P [X] = \sum_{i} P [X_{i} | Λ] P [Λ]$ , where $Λ$ is the bias in this case. Then when reasoning/updating, we can usually just think about how an individual die-roll interacts with $Λ$ , rather than all the other rolls, which is useful insofar as $Λ$ is much smaller than all the rolls.
  Note that $P [X | Λ]$ is not supposed to match $P [X]$ ; then the representation would be useless. It’s the marginal $\sum_{i} P [X_{i} | Λ] P [Λ]$ which is supposed to match $P [X]$ .
  The lightcone theorem lets us do something similar. Rather all the $X_{i}$ ‘s being independent given $Λ$ , only those $X_{i}$ ’s sufficiently far apart are independent, but the concept is otherwise similar. We express $P [X]$ as $\sum_{X^{0}} P [X | X^{0}] P [X^{0}]$ (or, really, $\sum_{Λ} P [X | Λ] P [Λ]$ , where $Λ$ summarizes info in $X^{0}$ relevant to $X$ , which is hopefully much smaller than all of $X$ ).
  - Rohin Shah 15 May 2023 17:23 UTC
    LW: 4 AF: 4
    0
    AF Parent
    Okay, I understand how that addresses my edit.
    I’m still not quite sure why the lightcone theorem is a “foundation” for natural abstraction (it looks to me like a nice concrete example on which you could apply techniques) but I think I should just wait for future posts, since I don’t really have any concrete questions at the moment.
    - Thane Ruthenis 16 May 2023 2:11 UTC
      LW: 4 AF: 3
      0
      AF Parent
      I’m still not quite sure why the lightcone theorem is a “foundation” for natural abstraction (it looks to me like a nice concrete example on which you could apply techniques)
      My impression is that it being a concrete example is the why. “What is the right framework to use?” and “what is the environment-structure in which natural abstractions can be defined?” are core questions of this research agenda, and this sort of multi-layer locality-including causal model is one potential answer.
      The fact that it loops-in the speed of causal influence is also suggestive — it seems fundamental to the structure of our universe, crops up in a lot of places, so the proposition that natural abstractions are somehow downstream of it is interesting.