4gate comments on Are there lessons from high-reliability engineering for AGI safety?

4gate 1 Mar 2026 18:35 UTC
1 point
0
I don’t think this contradicts your stated opinion as I understand it, but I think a few things are worth noting (though in some I speak to an extent from ignorance):
1. The mindset that goes into high reliability engineering (HRE) could carry over to applications of AI that are somewhat narrow but not narrow enough to make the AI fail to add significant utility. For example, a lot of general-ish agent deployments are AFAIK catastrophically unsafe/insecure by default right now (openclaw, a bunch of agentic apps that can be prompt-injected in such a way that some upstream state changes that should not change, etc...). If the general culture changed, this would mitigate risk (at least banal risks caused by humans, which can still be pretty bad: bad enough to cause extinction).
2. Narrow AI can still have a lot of utility. I’m thinking here of AI that is in some sense bounded in what it can do in space, but which can accelerate progress in time. For example: an AI that is superhuman at mathematical proofs (that are verifiable) can help accelerate mathematical research but it doesn’t have a much wider impact on other parts of the world (unless someone uses its proofs). Similarly, an AI that can quickly implement software in sandboxes could be similar. AI that can search across all of human knowledge or simulate outcomes without taking actions would also fall in this category and be immensely useful and increase economic output. Obviously, the AI has to “not break out”. Generally, by bounded in “space” I mean that the set of things they can impact in the physical world (i.e. “space”) is very small (i.e. maybe the memory/disk on/of a computer system). I think there’s enough such settings that you could have a future (if we had better coordination) where everyone gets access to great AIs, but only a few people get access to super duper genius AIs and only apply them very carefully in controlled, narrow, settings (where they would use HRE).
3. HRE probably is critical to the “not breaking out” part of (2) and probably important for the best possible initial deployments of not-quite-AGI-but-almost-there AIs that we are likely to see in then near future.
4. It’s reasonable to think that “not breaking out” is hopeless but making an effort may delay it such that alignment (and other salient technologies) will progress to the point that there is a lower chance that things to dreadfully wrong.