Thesis: linear diffusion of sparse lognormals contains the explanation for shard-like phenomena in neural networks. The world itself consists of ~discrete, big phenomena. Gradient descent allows those phenomena to make imprints upon the neural networks, and those imprints are what is meant by “shards”.
… But shard theory is still kind of broken because it lacks consideration of the possibility that the neural network might have an impetus to nudge those shards towards specific outcomes.
Thesis: linear diffusion of sparse lognormals contains the explanation for shard-like phenomena in neural networks. The world itself consists of ~discrete, big phenomena. Gradient descent allows those phenomena to make imprints upon the neural networks, and those imprints are what is meant by “shards”.
… But shard theory is still kind of broken because it lacks consideration of the possibility that the neural network might have an impetus to nudge those shards towards specific outcomes.