Brendan Long comments on Tree Transformers: A step towards generalizing the transformer architecture

Brendan Long 28 Jun 2026 3:34 UTC
2 points
0
Isn’t the reason we do attention over 1D vectors that that’s the shape of the data we have? Do you plan to somehow get tree-shaped inputs, or is this only about the internals and the tokens will stay vector-shaped?
- astle dsa 29 Jun 2026 22:11 UTC
  1 point
  0
  Parent
  My plan was to gather tree-shaped inputs, and observe whether tree-transformers offer any advantage over vector transformers.
  I do not think the reason we perform attention on 1D vectors is because of the data’s shape, rather, as I mentioned earlier, we more often force our data to be flattened arrays since it offers a multitude of pragmatic advantages which are hard to ignore.