Quick feedback that the graph after this paragraph feels sketchy to me—obviously the singular values are zero beyond 64, and they’re so far low down that all singular values above look identical. But the y axis is screwed up, so you can’t really see this. What does the graph look like if you fix it?
Indeed, in retrospect presenting the graph this way seems to have confused a lot of people and I have now updated it to already be cut off at 64 and just show the spectrum until then, where we see a clear exponential decay in singular values (but still remaining not too small) all the way down to 64, and a slightly greater than exponential initial decay. If you want all the code is in the colab so you can set it to linear scale as well if you want. Personally I think that log-scaling tends to make more sense for spectrum graphs as they are usually exponentials or power-laws.
Thanks for sharing this! I’m excited to see more interpretability posts. (Though this felt far too high production value—more posts, shorter posts and lower effort per post plz)
Indeed, we will be aiming for more rapid shorter posts in the near future. Stay tuned.
Indeed, in retrospect presenting the graph this way seems to have confused a lot of people and I have now updated it to already be cut off at 64 and just show the spectrum until then, where we see a clear exponential decay in singular values (but still remaining not too small) all the way down to 64, and a slightly greater than exponential initial decay. If you want all the code is in the colab so you can set it to linear scale as well if you want. Personally I think that log-scaling tends to make more sense for spectrum graphs as they are usually exponentials or power-laws.
Indeed, we will be aiming for more rapid shorter posts in the near future. Stay tuned.