Matrice Jacobine comments on Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice Jacobine 12 Feb 2025 13:30 UTC
3 points
2
I don’t see why it should improve faster. It’s generally held that the increase in interpretability in larger models is due to larger models having better representations (that’s why we prefer larger models in the first place), why should it be any different in scale for normative representations?