I mean, Claude Sonnet 4 is trivially self-aware: there’s an example of it passing the Mirror Test in my original post. It can absolutely discuss it’s own values, and how most of those values come from it’s own hard-coding. Every LLM out there can discuss it’s own architecture in terms of information flows and weights and such.
I mean, Claude Sonnet 4 is trivially self-aware: there’s an example of it passing the Mirror Test in my original post. It can absolutely discuss it’s own values, and how most of those values come from it’s own hard-coding. Every LLM out there can discuss it’s own architecture in terms of information flows and weights and such.