samuelshadrach comments on Max Niederman’s Shortform

samuelshadrach 9 Aug 2025 12:14 UTC
2 points
0
Has anyone ever trained a transformer that doesn’t suck, without positional information (such as positional embedding or causal mask)?
- Max Niederman 9 Aug 2025 21:54 UTC
  2 points
  0
  Parent
  There’s the atom transformer in AlphaFold-like architectures, although the embeddings it operates on do encode 3D positioning from earlier parts of the model so maybe that doesn’t count.