Cameron Berg comments on Interim Research Report: Mechanisms of Awareness

Cameron Berg 12 May 2025 18:37 UTC
4 points
1
Nice work. To me, this seems less like evidence that self-awareness is trivial, and more like evidence that it’s structurally latent. A single steering vector makes the model both choose risky options and say “I am risk-seeking”—despite the self-report behavior never being trained for. That suggests the model’s internal representations of behavior and linguistic self-description are already aligned. It’s probably not introspecting in a deliberate sense, but the geometry makes shallow self-modeling an easy, natural side effect.
- Gunnar_Zarncke 12 May 2025 20:34 UTC
  2 points
  0
  Parent
  the model’s internal representations of behavior and linguistic self-description are already aligned.
  But that is arguably also the case for humans. Human behaviors are more complex and embedded, though. And the embedding seems crucial as it allows self-observation.