Vivek Hebbar comments on What’s up with LLMs representing XORs of arbitrary features?

Vivek Hebbar 7 Jan 2024 6:07 UTC
LW: 1 AF: 1
0
AF
Maybe models track which features are basic and enforce that these features be more salient
Couldn’t it just write derivative features more weakly, and therefore not need any tracking mechanism other than the magnitude itself?
- Sam Marks 7 Jan 2024 16:49 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Some features which are computed from other features should probably themselves be treated as basic and thus represented with large salience.