james__p comments on Thoughts on Toy Models of Superposition

james__p 10 Mar 2025 22:08 UTC
1 point
0
Just to check, in the toy scenario, we assume the features in R^n are the coordinates in the default basis. So we have n features X_1, …, X_n
Yes, that’s correct.
Separately, do you have intuition for why they allow network to learn b too? Why not set b to zero too?
My understanding is that the bias is thought to be useful for two reasons:
- It is preferable to be able to output a non-zero value for features the model chooses not to represent (namely their expected values)
- Negative bias allows the model to zero-out small interferences, by shifting the values negative such that the ReLU outputs zero. I think empirically when these toy models are exhibiting lots of superposition, the bias vector typically has many negative entries.