tailcalled comments on Causal confusion as an argument against the scaling hypothesis

tailcalled 21 Jun 2022 18:46 UTC
2 points
0
The key areas where I can think of this being a problem is 1. when there are unobservable latent variables, particularly ones which act on a very slow time scale, or 2. when the training data only varies on a submanifold of the full state space.
I wonder if certain kinds of inductive biases can help address 2. E.g. if you have a model architecture that demands everything to reduce to small-scale dynamics for voxels, like forcing the world model to be a giant 3D CNN, then you don’t need the training data to vary across the full state space. Instead you might be able to get away with the training data having voxels that span the full state space.
I think problem 1 is just genuinely super hard to solve but I don’t know for sure. There’s a lot of information about problem 1 that exists in e.g. text on the internet, so maybe it contains the solution.