Distributional Shifts

TagLast edit: 25 Aug 2022 13:18 UTC by ojorgensen

Many learning-theoretic setups (especially in the Frequentist camp) make an IID assumption: that data can be split up into samples (sometimes called episodes or data-points) which are independently sampled from identical distributions (hence “IID”). This assumption sometimes allows us to prove that our methods generalize well; see especially PAC learning. However, in real life, when we say that a model “generalizes well”, we really mean that it works well on new data which realistically has a somewhat different distribution. This is called a distributional shift or a non-stationary environment.

This framework (in which we initially make an IID assumption, but then, model violations of it as “distributional shifts”) has been used extensively to discuss robustness issues relating to AI safety—particularly, inner alignment. We can confidently anticipate that traditional machine learning systems (such as deep neural networks) will perform well on average, so long as the deployment situation is statistically similar to the training data. However, as the deployment distribution gets further from the training distribution, catastrophic behaviors which are very rare on the original inputs can become probable.

This framework makes it sound like a significant part of the inner alignment problem would be solved if we could generalize learning guarantees from IID cases to non-IID cases. (Particularly if loss bounds can be given at finite times, not merely asymptotically, while maintaining a large, highly capable hypothesis class.)

However, this is not necessarily the case.

Solomonoff Induction avoids making an IID assumption, and so it is not strictly meaningful to talk about “distributional shifts” for a Solomonoff distribution. Furthermore, the Solomonoff distribution has constructive bounds, rather than merely asymptotic. (We can bound how difficult it is to learn something based on its description length.) Yet, inner alignment problems still seem very concerning for the Solomonoff distribution.

This is a complex topic, but one reason why is that inner optimizers can potentially tell the difference between training and deployment. A malign hypothesis can mimic a benign hypothesis until a critical point where a wrong answer has catastrophic potential. This is called a treacherous turn.

So, although “distributional shift” is not technically involved, we can see that a critical difference between training and deployment is still involved: during training, wrong answers are always inconsequential. However, when you use a system, wrong answers become consequential. If the system can figure this difference out, then parts of the system can use it to “gate” their behavior in order to accomplish a treacherous turn.

This makes “distributional shift” seem like an apt metaphor for what’s going on in non-IID cases. However, buyer beware: eliminating IID assumptions might eliminate the literal source of the distributional shift problem without eliminating the broader constellation of concerns for which the words “distributional shift” are being used.

how 2 tell if ur input is out of distribution given only model weights

dkirmani5 Aug 2023 22:45 UTC

49 points

10 comments1 min readLW link

[Question] Nonlinear limitations of ReLUs

magfrump26 Oct 2023 18:51 UTC

13 points

1 comment1 min readLW link

From SLT to AIT: NN generalisation out-of-distribution

Lucius Bushnaq4 Sep 2025 15:20 UTC

114 points

8 comments14 min readLW link

Mesa-optimization for goals defined only within a training environment is dangerous

Rubi J. Hudson17 Aug 2022 3:56 UTC

6 points

2 comments4 min readLW link

Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping

RobertKirk20 Jul 2023 9:56 UTC

39 points

2 comments5 min readLW link

Causal representation learning as a technique to prevent goal misgeneralization

PabloAMC4 Jan 2023 0:07 UTC

21 points

0 comments8 min readLW link

[Question] Inevitable Growth and Consequences of AI

baleful pokemon23 Jan 2025 1:31 UTC

1 point

0 comments1 min readLW link

Thoughts about OOD alignment

Catnee24 Aug 2022 15:31 UTC

11 points

10 comments2 min readLW link

Ambiguous out-of-distribution generalization on an algorithmic task

Wilson Wu and Louis Jaburi

13 Feb 2025 18:24 UTC

83 points

6 comments11 min readLW link

We are misaligned: the saddening idea that most of humanity doesn’t intrinsically care about x-risk, even on a personal level

Christopher King19 May 2023 16:12 UTC

3 points

5 comments2 min readLW link

Distribution Shifts and The Importance of AI Safety

Leon Lang29 Sep 2022 22:38 UTC

17 points

2 comments9 min readLW link

Is there a ML agent that abandons it’s utility function out-of-distribution without losing capabilities?

Christopher King22 Feb 2023 16:49 UTC

1 point

7 comments1 min readLW link

Why do we need RLHF? Imitation, Inverse RL, and the role of reward

Ran W3 Feb 2024 4:00 UTC

16 points

0 comments5 min readLW link

Breaking down the training/deployment dichotomy

Erik Jenner28 Aug 2022 21:45 UTC

30 points

3 comments3 min readLW link

[Question] Have you heard about MIT’s “liquid neural networks”? What do you think about them?

Ppau9 May 2023 20:16 UTC

35 points

14 comments1 min readLW link

Disentangling inner alignment failures

Erik Jenner10 Oct 2022 18:50 UTC

24 points

5 comments4 min readLW link

Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom)

RogerDearnaley25 May 2023 9:26 UTC

33 points

3 comments15 min readLW link

abramdemski 24 Aug 2022 18:06 UTC
4 points
0
“Distributional Shifts” seems like the more standard term imho. I’m considering re-naming.
- jp 24 Aug 2022 20:14 UTC
  3 points
  0
  Parent
  NB: the title no longer appears in bold in the first sentence, contra the style guide.
  - abramdemski 24 Aug 2022 20:26 UTC
    2 points
    0
    Parent
    I didn’t fix it, but I de-bolded all the other technical terms that I spuriously bolded, so that distributional shift now sticks out more even though it is not in the first sentence.
    Not quite sure how to get it in the first sentence in a clean way, since I really feel I have to define IID first in order to define distributional shift properly.
- abramdemski 24 Aug 2022 18:54 UTC
  2 points
  0
  Parent
  Edited. Also added a tag description defining relevant terminology.

Distri­bu­tional Shifts

Distributional Shifts