tailcalled comments on “Deep Learning” Is Function Approximation

tailcalled 22 Mar 2024 17:11 UTC
4 points
0
It may also be worth adding that transformers aren’t piecewise linear. A self-attention layer dynamically constructs pathways for information to flow through, which is very nonlinear.