Your description of “old-school ML explanations” makes me think of this Chris Olah article. It, along with the work of Mingwei Li ( grand tour, umap tour ) and a bunch of time spent trying to reason about the math and geometry of NNs, is what I base my current POV on.
map discrete regions of activation space to specific activations later on
If I understand correctly this corresponds with one of my key claims, that “position” not “direction” is fundamental to semantics in activation spaces. If “direction” is relevant, it is possible for it to be local and distort over distance, more like a vector field then a single vector that applies to the whole space.
In this and the following section in this video I give some more description of the idea if you are interested.
I’m planning to do self study from next week after I finish my final exam until November, and one of the things I want to do is a deep dive on the transformer architecture and attempt to extend my understanding of these concepts as they apply to vanilla and conv nets to transformer based nets.
Hey! Thanks for the links. I’ll look into them.
Your description of “old-school ML explanations” makes me think of this Chris Olah article. It, along with the work of Mingwei Li ( grand tour, umap tour ) and a bunch of time spent trying to reason about the math and geometry of NNs, is what I base my current POV on.
If I understand correctly this corresponds with one of my key claims, that “position” not “direction” is fundamental to semantics in activation spaces. If “direction” is relevant, it is possible for it to be local and distort over distance, more like a vector field then a single vector that applies to the whole space.
In this and the following section in this video I give some more description of the idea if you are interested.
I’m planning to do self study from next week after I finish my final exam until November, and one of the things I want to do is a deep dive on the transformer architecture and attempt to extend my understanding of these concepts as they apply to vanilla and conv nets to transformer based nets.
I look forward to seeing your future work!