Jonathan Richens

Karma: 91

Research scientist, alignment team, Google DeepMind

Jonathan Richens 20 Dec 2024 21:52 UTC
15 points
1
on: An Illustrated Summary of “Robust Agents Learn Causal World Model”
This is such a good deep dive into our paper, which I will be pointing people to in the future. Thanks for writing it!
Agree that conditioning on the intervention is unnatural for agents. One way around this is to note that adapting to an unknown distributional shift given only sensory inputs Pa_D is strictly harder than adapting to a known distributional shift (given Pa_D and sigma). It follows that any agent capable of adapting given only its sensory inputs must have learned a CWM (footnotes, p6).

Incentives from a causal perspective

tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall and Jonathan Richens

10 Jul 2023 17:16 UTC

27 points

0 comments6 min readLW link

Agency from a causal perspective

tom4everitt, mattmacdermott, James Fox, Francis Rhys Ward and Jonathan Richens

30 Jun 2023 17:37 UTC

40 points

5 comments6 min readLW link

Jonathan Richens 25 Jun 2023 9:36 UTC
1 point
0
in reply to: Alex_Altair’s comment on: Causality: A Brief Introduction
Yes I can flip two independent coins a finite number of times and get strings that appear to be correlated. But in the asymptotic limit the probability they are the same (or correlated at all) goes to zero. Hence, two causally unrelated things can appear dependent for finite sample sizes. But when we have infinite samples (which is the limit we assume when making statements about probabilities) we get P(a,b) = P(a)P(b).

Jonathan Richens 22 Jun 2023 13:06 UTC
2 points
0
in reply to: Alex_Altair’s comment on: Causality: A Brief Introduction
Thanks for commenting! This is an interesting question and answering it requires digging into some of the subtleties of causality. Unfortunately the time series framing you propose doesnt work because this time series data is not iid (the variable A = “the next number out of program 1” is not iid), while by definition the distributions P(A), P(B) and P(A,B) you are reasoning with are assuming iid. We really have to have iid here, otherwise we are trying to infer correlation from a single sample. By treating non-iid variables as iid we can see correlations where there are no correlations, but those correlations come from the fact that the next output depends on the previous output, not because the output of one program depends on the output of the other program.
We can fix this by imagining a slightly different setup that I think is faithful to your proposal. Basically the same thing but instead of computing pi, both the programs have in memory a random string of bits, with 0 or 1 occurring with probability ¹⁄₂ for each bit. Both programs just read out the string. Let the string of random bits be identical for program 1 and 2. Now, we can describe each output of the programs as iid. If these are the same for both program, the outputs of the programs are perfectly correlated. And you are right, by looking at the output of one of the programs I can update by beliefs on the output of the other program.
Then we need to ask, how do we generate this experiment? To get the string of random bits we have to sample a coin flip, and then make two copies of the outcome and send it to both programs. If we tried to do this with two coins separately at different ends of the universe, we would get diffrent bit strings. So the two programs have in their past light cones a shared source of randomness—this is the common cause.

Jonathan Richens 20 Jun 2023 20:48 UTC
8 points
7
in reply to: Alex_Altair’s comment on: Causality: A Brief Introduction
In the example of the two programs, we have to be careful with what we mean by statistical correlation v.s. more standard / colloquial use of the term. Im assuming here when you say `the same program running on opposite ends of the universe, and their outputs would be the same’ that you are referring to a deterministic program (else, there would be no guarantee that the outputs were the same). But, if the output of the two programs is deterministic, then there can be no statistical correlation between them. Let A be the outcome of the first program and B the outcome of the second. To measure statistical correlation we have to run the two programs many times generating i.i.d. samples of A and B, and they are correlated if P(A, B) is not equal to P(A)P(B). But if the two programs are deterministic, say A = a and B = b with probability 1, then they are not statistically correlated, as P(A = a, B = b) = 1 and P(A = a)P(B = b) = 1. So to get some correlation the output of the programs have to be random. To have two random algorithms generating correlated outcomes, they need to share some randomness they can condition their outputs on, i.e. a common cause. With the two planets example, we run into the same problem again. (PS by correlation Reichenbach means statistical dependence here rather than e.g. Pearson correlation, but the same argument applies).

Broadly speaking, to point of the (perhaps confusing) reference in the article is to say that if we accept the laws of physics, then all the things we observe in the universe are ultimately generated by causal dynamics (e.g. the classical equations of motion being applied to some initial conditions.) We can always describe these causal dynamics + initial conditions using a causal model. So there is always `some causal model’ that describes our data.

Causality: A Brief Introduction

tom4everitt, Lewis Hammond, Jonathan Richens, Francis Rhys Ward, RyanCarey, sbenthall and James Fox

20 Jun 2023 15:01 UTC

49 points

18 comments6 min readLW link

Jonathan Richens

In­cen­tives from a causal perspective

Agency from a causal perspective

Causal­ity: A Brief Introduction

Incentives from a causal perspective

Causality: A Brief Introduction