delicate symmetry-breaking which can only come from either the training procedure or noise in the data, rather than the model itself
I’m still not convinced. The pointwise nonlinearities introduce a preferred basis, and cause the individual hidden units to be much more meaningful than linear combinations thereof.
Yeah; I discussed this with some others and came to the same conclusion. I do still think that one should explain why the preferred basis ends up being as meaningful as it does, but agree that this is a much more minor objection.
I’m still not convinced. The pointwise nonlinearities introduce a preferred basis, and cause the individual hidden units to be much more meaningful than linear combinations thereof.
Yeah; I discussed this with some others and came to the same conclusion. I do still think that one should explain why the preferred basis ends up being as meaningful as it does, but agree that this is a much more minor objection.