Quintin Pope comments on Inner Misalignment in “Simulator” LLMs

Quintin Pope 2 Feb 2023 9:20 UTC
8 points
3
If we take LLMs to be simulators, they’d necessarily need to have some function that maps the simulation-state to the probability over the output tokens
I disagree with this picture. “Simulators” just describes the external behavior of the model, and doesn’t imply LLMs internally function anything like the programs humans write when we want to simulate something, or like our intuitive notions of what a simulator ought to do.
I think it’s better to start with what we’ve found of deep network internal structures, which seem to be exponentially large ensembles of fairly shallow paths, and then think about what sort of computational structures would be consistent with that information while also 1) achieving low loss, and 2) being plausibly findable by SGD from a random init.
My tentative guess is that LLMs internally look like a fuzzy key-value lookup table over a vast quantity of (mostly shallow) patterns about text content. They do some sort of similarity matching between the input texts and the features that different stored patterns “expect” in any text to which the pattern applies. Any patterns which trigger then quickly add their predictions into the residual stream, similar to what’s described here.
In such a structure, having any significant translation step between the internal states of the predictive patterns and the output logits would be a huge issue, because you’d have to replicate that translation across the network many times, not just once per layer, but many times per layer, because single layers are implementing many ~independent paths simultaneously.
I do agree that LLM architectures seem poorly suited to learning the sorts of algorithms I think people imagine when they say stuff like “general purpose search”. However, I take that as an update against those sorts of algorithms being important for powerful cognition, essentially considering that transformers have been the SOTA architecture for over 5 years while remaining essentially unchanged, despite many, many people trying to improve on them.
- Thane Ruthenis 2 Feb 2023 10:00 UTC
  4 points
  1
  Parent
  Fair enough, I don’t disagree that it’s how current LLMs likely work.
  I maintain, however, that it makes me very skeptical that their architecture is AGI-complete. In particular, I expect it’s incapable of supporting the sort of high-fidelity simulations that people often talk about in the context of e. g. accelerating alignment research. And that, on the contrary, the architectures that are powerful enough would be different enough to support search and therefore carry the dangers of inner misalignment.
  I can sort of see the alternate picture, though, where the shallow patterns they implement include some sort of general-enough planning heuristics that’d theoretically let them make genuinely novel inferences over enough steps. I think that’d run into severe inefficiencies… but my intuition on that is a bit difficult to unpack.
  Hm. Do you think the current LLM architectures are AGI-complete, if you scale them enough? If yes, how do you imagine they’d be carrying out novel inferences, mechanically? Inferences that require making use of novel abstractions?