Adam Jermyn comments on Causal confusion as an argument against the scaling hypothesis

Adam Jermyn 21 Jun 2022 22:26 UTC
LW: 1 AF: 1
0
AF
I think I basically hold disagreement (1), which I think is close to Owain’s comment. Specifically. I think a plausible story for a model learning causality is:
1. The model learns a lot of correlations, most real (causal) but many spurious.
2. The model eventually groks that there’s a relatively simple causal model explaining the real correlations but not the spurious ones. This gets favored by whatever inductive bias the training process/architecture encodes.
3. The model maintains uncertainty as to whether the spurious correlations are real or spurious, the same way humans do.
In this story the model learns both a causal model and the spurious correlations. It doesn’t dismiss the spurious correlations but still models the causal ones. This lets it minimize loss, which I think addresses the counter argument to (1).