Nate Showell comments on Inscrutability was always inevitable, right?

Nate Showell 10 Aug 2025 18:50 UTC
1 point
−2
Building on what you said, pre-LLM agent foundations research appears to have made the following assumptions about what advanced AI systems would be like:
1. Decision-making processes and ontologies are separable. An AI system’s decision process can be isolated and connected to a different world-model, or vice versa.
2. The decision-making process is human-comprehensible and has a much shorter description length than the ontology.
3. As AI systems become more powerful, their decision processes approach a theoretically optimal decision theory that can also be succinctly expressed and understood by human researchers.
None of these assumptions ended up being true of LLMs. In an LLM, the world-model and decision process are mixed together in a single neural network instead of being separate entities. LLMs don’t come with decision-related concepts like “hypothesis” and “causality” pre-loaded; those concepts are learned over the course of training and are represented in the same messy, polysemantic way as any other learned concept. There’s no way to separate out the reasoning-related features to get a decision process you could plug into a different world-model. In addition, when LLMs are scaled up, their decision-making becomes more complex and inscrutable due to being distributed across the neural network. The LLM’s decision-making process doesn’t converge into a simple and human-comprehensible decision theory.