Adam Shai comments on Transformers Represent Belief State Geometry in their Residual Stream

Adam Shai 17 Apr 2024 15:39 UTC
1 point
0
this looks highly relevant! thanks!
- Ran W 18 Apr 2024 15:53 UTC
  2 points
  0
  Parent
  This reminds me of the paper Chris linked as well. I think there’s very solid evidence on the relationship between the kind of meta learning Transformers go through and Bayesian inference (e.g., see this, this, and this). The main question I have been thinking about is what is a state for language and how that can be useful if so discovered in this way? For state-based RL/control tasks this seems relatively straightforward (e.g., see this and this), but this is much less clear for more abstract tasks. It’d be great to hear your thoughts!