leogao comments on Are there lessons from high-reliability engineering for AGI safety?

leogao 2 Feb 2026 23:45 UTC
LW: 9 AF: 6
0
AF
i broadly agree with this take for AGI, but I want to provide some perspective on why it might be counterintuitive: when labs do safety work on current AI systems like chatgpt, a large part of the work is writing up a giant spec that specifies how the model should behave in all sorts of situations—when asked for a bioweapon, when asked for medical advice, when asked whether it is conscious, etc. as time goes on and models get more capable, this spec gets bigger and more complicated.
the obvious retort is that AGI will be different. but there are a lot of people who are skeptical of abstract arguments that things will be different in the future, and much more willing to accept arguments based on current empirical trends.
- ButICouldBeWrong 3 Feb 2026 20:10 UTC
  1 point
  0
  Parent
  Current LLMs seem to be relatively easy to align by writing those kinds of specifications and mostly don’t try to do harmful things, at least not those of the frontier labs. I just think that soon after LLM-based AGI gets developed, one of the first tasks given to the LLM will probably be to develop novel more efficient AI architectures in order to reduce the high energy usage of current architectures. Because LLM-based AGI will probably consume even more energy than current LLMs.
  And the LLM might not be as careful as safety researchers when it tries to find more efficient architectures, especially when pressured by humans to try test new approaches anyway despite the risks involved that the LLM might be aware of, but the human is focused more on the positive potential of the technology.
  My guess is that it will end up with some approach that will use neuralese and not be language-based, because language is ambiguous, loses meaning and most importantly limits AIs thinking to concepts known to humans, which does not include all the possible concepts in the very vast “concept-space” of superintelligent understanding. And not only the concepts but also the very nature of human reasoning, which is most likely not the most effective way to find a solution to a given problem.
  So basically at some point too high energy demands will pressure AI development to switch from language models to neuralese models, which are hard to align, let alone understand.
  Except if the LLM is tasked with finding a breakthrough in fusion power, that might then let us sustain LLM training and inference.