Isn’t the reason we do attention over 1D vectors that that’s the shape of the data we have? Do you plan to somehow get tree-shaped inputs, or is this only about the internals and the tokens will stay vector-shaped?
My plan was to gather tree-shaped inputs, and observe whether tree-transformers offer any advantage over vector transformers. I do not think the reason we perform attention on 1D vectors is because of the data’s shape, rather, as I mentioned earlier, we more often force our data to be flattened arrays since it offers a multitude of pragmatic advantages which are hard to ignore.
Isn’t the reason we do attention over 1D vectors that that’s the shape of the data we have? Do you plan to somehow get tree-shaped inputs, or is this only about the internals and the tokens will stay vector-shaped?
My plan was to gather tree-shaped inputs, and observe whether tree-transformers offer any advantage over vector transformers.
I do not think the reason we perform attention on 1D vectors is because of the data’s shape, rather, as I mentioned earlier, we more often force our data to be flattened arrays since it offers a multitude of pragmatic advantages which are hard to ignore.