the gears to ascension comments on Why do we assume there is a “real” shoggoth behind the LLM? Why not masks all the way down?

the gears to ascension 9 Mar 2023 19:21 UTC
6 points
0
you can clearly see diffusion models being piles of masks because their mistakes are more visibly linear model mistakes. you can see very very small linear models being piles of masks when you handcraft them as a 2d or 3d shader:
-
ARTICLE: random neural networks visualized as fields for artistic purposes. tanh activation, so not actually a “linear fragments” model like max-matmul nets (relu, etc), but still a very interesting visualization:
-
SHADER: 2d animated space folding shader − 2d intuition for what’s going on in the following 3d one https://www.shadertoy.com/view/tltSWs
-
SHADER: 3d animated simple space folding shader—space folding animated. shader code slightly hard to read, see below for easier to read shader code.
-
ARTICLE: geometric illustrations for neural networks—visualization of how decision boundaries move in a very small neural field
-
VIDEO: stable diffusion animation—you can see similar “whoops, found a decision boundary!” moments all throughout the video, which may give some intuition about what smooth decision boundaries look like as you move past them in net io space
-
Of course, in the really really high dimensional models, the shapes they learn can have nested intricacy that mostly captures the shape of the real pattern in the data. but their ability to actually match the pattern in a way that generalizes correctly can have weirdly spiky holes, which is nicely demonstrated in the geometric illustrations gif. and you can see such things happening in lower dimensionality by messing with the models on purpose. eg, messing up a handcrafted linear space folding model:
SHADER: modified version of the one above. ***FLASHING LIGHTS ABOVE 20HZ AT SOME MOMENTS*** - this shader shows in a lot more visual detail how one can end up with tiny fragments floating all around a high-dimensional space due to decision boundaries not lining up coherently with each other. also shows network depth being animated: number of fold iterations changes every 6 seconds.