Gurkenglas comments on Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Gurkenglas 12 Feb 2025 13:14 UTC
2 points
−2
This does go in the direction of refuting it, but they’d still need to argue that linear probes improve with scale faster than they do for other queries; a larger model means there are more possible linear probes to pick the best from.
- Matrice Jacobine 12 Feb 2025 13:30 UTC
  3 points
  2
  Parent
  I don’t see why it should improve faster. It’s generally held that the increase in interpretability in larger models is due to larger models having better representations (that’s why we prefer larger models in the first place), why should it be any different in scale for normative representations?