Matrice Jacobine comments on Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice Jacobine 12 Feb 2025 12:12 UTC
7 points
2
This interpretation is straightforwardly refuted (insofar as it makes any positivist sense) by the success of the parametric approach in “Internal Utility Representations” being also correlated with model size.
- Gurkenglas 12 Feb 2025 13:14 UTC
  2 points
  −2
  Parent
  This does go in the direction of refuting it, but they’d still need to argue that linear probes improve with scale faster than they do for other queries; a larger model means there are more possible linear probes to pick the best from.
  - Matrice Jacobine 12 Feb 2025 13:30 UTC
    3 points
    2
    Parent
    I don’t see why it should improve faster. It’s generally held that the increase in interpretability in larger models is due to larger models having better representations (that’s why we prefer larger models in the first place), why should it be any different in scale for normative representations?