My plan was to gather tree-shaped inputs, and observe whether tree-transformers offer any advantage over vector transformers.
I do not think the reason we perform attention on 1D vectors is because of the data’s shape, rather, as I mentioned earlier, we more often force our data to be flattened arrays since it offers a multitude of pragmatic advantages which are hard to ignore.
astle dsa
Karma: 3
The process that implements the logic being a model itself feels closer to routing, but the premise is very interesting. Thanks for pointing me toward’s Sakana’s research!