This interpretation is straightforwardly refuted (insofar as it makes any positivist sense) by the success of the parametric approach in “Internal Utility Representations” being also correlated with model size.
This does go in the direction of refuting it, but they’d still need to argue that linear probes improve with scale faster than they do for other queries; a larger model means there are more possible linear probes to pick the best from.
I don’t see why it should improve faster. It’s generally held that the increase in interpretability in larger models is due to larger models having better representations (that’s why we prefer larger models in the first place), why should it be any different in scale for normative representations?
This interpretation is straightforwardly refuted (insofar as it makes any positivist sense) by the success of the parametric approach in “Internal Utility Representations” being also correlated with model size.
This does go in the direction of refuting it, but they’d still need to argue that linear probes improve with scale faster than they do for other queries; a larger model means there are more possible linear probes to pick the best from.
I don’t see why it should improve faster. It’s generally held that the increase in interpretability in larger models is due to larger models having better representations (that’s why we prefer larger models in the first place), why should it be any different in scale for normative representations?