The typical noise on feature caused by 1 unit of activation from feature , for any pair of features , , is (derived from Johnson–Lindenstrauss lemma)
1. … This is a worst case scenario. I have not calculated the typical case, but I expect it to be somewhat less, but still same order of magnitude
Perhaps I’m misunderstanding your claim here, but the “typical” (i.e. RMS) inner product between two independently random unit vectors in is . So I think the shouldn’t be there, and the rest of your estimates are incorrect.
This means that we can have at most simultaneously active features
This conclusion gets changed to .
The peaks at 0.05 and 0.3 are strange. What regulariser did you use? Also, could you check whether all features whose nearest neighbour has cosine similarity 0.3 have the same nearest neighbour (and likewise for 0.05)?