RogerDearnaley comments on Most People Don’t Realize We Have No Idea How Our AIs Work

RogerDearnaley 22 Dec 2023 11:52 UTC
1 point
0
But there are like 10x more safety people looking into interpretability instead of how they generalize from data, as far as I can tell.)
An intriguing observation. But the ability to extrapolate accurately outside the training data is a result of building accurate world models. So to understand this, we’d need to understand the sorts of world models that LLMs build and how they interact. I’m having some difficulty immediately thinking of a way of studying that that doesn’t require first being a lot better at interpretability than we are now. But if you can think of one, I’d love to hear it.
- Thane Ruthenis 22 Dec 2023 14:08 UTC
  2 points
  0
  Parent
  I’m having some difficulty immediately thinking of a way of studying that
  Pretty sure that’s not what 1a3orn would say, but you can study efficient world-models directly to grok that. Instead of learning about them through the intermediary of extant AIs, you can study the thing that these AIs are trying to ever-better approximate itself.
  See my (somewhat outdated) post on the matter, plus the natural-abstractions agenda.