Donald Hobson comments on If you don’t feel deeply confused about AGI risk, something’s wrong

Donald Hobson 29 Mar 2026 9:34 UTC
2 points
0
I don’t think you can rescue a sense of control or “steering” from a world with superintelligence, aligned or not.
I think some level of “steering” is possible in a world with aligned AI.
Suppose someone made a super-intelligence that sat in it’s box, worked out if P=NP, and printed an answer of YES/NO/MAYBE. And then it shut itself down. (To be clear, this isn’t a box that the ASI can’t escape, it’s an ASI aligned to stay in it’s box)
A world with ASI, but where humans are in control is possible. It requires good alignment, and good coordination between humans. Although the “stay in box, and do one thing” alignment feels philosophically simpler than the “coherent extrapolated volition” alignment.
This means paying a large capabilities tax. Most of the strange wonderous and powerful things that ASI could make simply don’t exist in this world of boxed ASI.
Lets say you want to do something more useful than the P =NP bot above. You design an ASI to cure ageing. Its main output is a chemical formula in standard notation. This AI is carefully programmed to only think about the biochemistry, and only the biochemistry. It’s programmed to only go for a drug that works for standard drug biochemistry reasons. Anything at all weird, ask a human. If the humans can’t understand, don’t.