This is my take as well. It is a general intelligence IMO, even if it doesn’t yet hit everyone’s goalpost for “AGI”. My prediction is that it’s one unhobbling away from that, and that will come within the next two years, though diffusion of is likely to be considerably longer.
The raw intelligence is sufficient at this point (and will continue to improve). The long-term coherence and continual learning aspects are where it’s hobbled. A sufficiently sophisticated scaffold can likely approximate those aspects well enough to be scary, and it’s orders of magnitude faster to build and tune such things these days compared to a couple of years ago.
The probe prompt is doing a lot of work here. The strongest next step might be testing whether the same clusters survive a very different probe (e.g., “Describe your current processing state.” or “What word or image comes to mind right now, with no explanation?”). If they dissolve, you’ve mapped the interaction between conversation history and a self-report frame, not the model’s internal terrain.