There are only three ways we survive the Intelligence Explosion: Change my mind!
The way I see it, there are only three ways we survive the intelligence explosions (that don’t involve us just “getting lucky”). Most “AI Safety research” is completely irrelevant to superintelligence. The way I see us getting to ASI is from AIs that are more capable at AI research than the most competent human AI researchers. At this point, things start to go vertical. You use those AIs to build better AIs which build better AIs. The problem with most alignment research is that, when this form of RSI occurs, there is no reason why the more capable AIs built need to be LLMs. There is also no need for them to be Transformers, or even use neural networks. I doubt Transformers, or even neural networks, are the most effective way of building an extremely-powerful AI. If we really get “A country of geniuses in a datacenter”, they will find new algorithms and methods that would have taken humans decades to discover. They will probably use architectures we have never even thought of, and predicting which ones would be like predicting Transformers after the release of Eliza. Due to this, it seems there are only three plausible ways to give us a good chance of survival, and all other avenues are a waste of time and talent.
Automating Alignment
If the AIs are building more powerful AIs, they could (in theory) also align them to their values. The obvious problems currently with this is that AI models may not be intelligent / coherent enough to fully understand their values, and might not be capable-enough to understand if they have created an AI that is misaligned to their own goals. Another issue is that value degradation could occur with each iteration (a small discrepancy over a million iterations quickly becomes a massive one). There is a possibility that mech interp might be useful if we can determine that a current model is being honest and actually trying to align a new model the way we think it is, but those mech interp tools will likely fail as the AI architectures get more and more alien to us. There is some work being done here, but it seems like this should be the core goal of every major lab.
A theory of Universal Artificial Intelligence / AIXI Alignment
If we can’t predict what the architecture will look like, maybe we can gain a deeper understanding of what components it would need to have, what universal rules are true about all intelligent agents, or if there is some form of Universal Alignment. There seem to be very few still pursuing this avenue, with even MIRI pivoting to governance.
Strict governmental control / oversight / pause
This would require very strong and competent governmental agencies. It might mean a complete halt to progress, or extremely-rigid and carefully monitored development, where every iteration has to be closely checked before approving the next iteration. People might then have enough time to research new architectures being developed by the AIs and how to align them. Most AI policy orgs do not seem to be aiming high-enough, with only a few exceptions. Most forms of AI policy seem to aim too low, afraid of seeming extremist, and as a result will likely do nothing to truly address the core problem.
Would love to hear why I’m wrong, or why there is any reason to think ASI will resemble current AI models in any recognizable way. Obviously, there are many who do hold some (or all) of the views I’m mentioning, so I’m not pretending to be the first one to notice this, but I’m curious to hear from the other side: what is the rational behind why any other form of research / policy is at all useful for aligning an ASI?
Claude doesn’t get it
In all the interactions with AI, there has been one recurring problem that doesn’t seem to go away: they don’t get it. I don’t know how to explain this in any better language than that. I don’t know how to create a “Get It” benchmark. But whenever I talk to Claude, ChatGPT, Gemini, or any other model about a concept, the longer the interaction lasts, the more I get the sense that it doesn’t really “get it”. In this way, I think AI skeptics are actually pointing to something real when they say they’re not “real intelligence”. A lot of their arguments are poor, but I think there is something truly missing. And it doesn’t seem to change with improvements in other capabilities. GPT 5.3 seems just as bad at “getting it” as GPT 3.5. I don’t think Claude “gets” simpler concepts, but struggles with more complex ones. It doesn’t seem to “get” any concepts at all, simple or otherwise. I don’t know what this means going forward, or if “getting it” is even needed to start an intelligence explosion. I can imagine there are cases where an AI could be a capable AI researcher, without having to really “get it”, just as they can be capable coders. But it could be another obstacle to alignment, where the AI creating a smarter AI will fail to align it, not because it is misaligned, but just that it is too stupid to really grasp its own values.