Bogdan Ionut Cirstea answers Does the Universal Geometry of Embeddings paper have big implications for interpretability?

Bogdan Ionut Cirstea 27 May 2025 18:42 UTC
8 points
2
Yes, I do think this should be a big deal, and even more so for monitoring (than for understanding model internals). It should also have been at least somewhat predictable, based on theoretical results like those in I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? and in All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling.