My plan was to gather tree-shaped inputs, and observe whether tree-transformers offer any advantage over vector transformers. I do not think the reason we perform attention on 1D vectors is because of the data’s shape, rather, as I mentioned earlier, we more often force our data to be flattened arrays since it offers a multitude of pragmatic advantages which are hard to ignore.
My plan was to gather tree-shaped inputs, and observe whether tree-transformers offer any advantage over vector transformers.
I do not think the reason we perform attention on 1D vectors is because of the data’s shape, rather, as I mentioned earlier, we more often force our data to be flattened arrays since it offers a multitude of pragmatic advantages which are hard to ignore.