I agree with a lot of this, we discuss the trickiness of measuring this properly in the paper (Appendix E.1) and I touched on it a bit in this post (the last bullet point in the last section). We did consider normalizing by the L2, ultimately we decided against that because the L2 indexes too heavily on the size of the majority of elements rather than indexing on the size of the largest elements, so it’s not really what we want. Fwiw I think normalizing by the L4 or the L_inf is more promising.
I agree it would be good for us to report more data on the pre-trained vs randomized thing specifically. I don’t really see that as a central claim of the paper so I didn’t prioritize putting a bunch of stuff for it in the appendices, but I might do a revision with more stats on that, and I really appreciate the suggestions.
I agree with a lot of this, we discuss the trickiness of measuring this properly in the paper (Appendix E.1) and I touched on it a bit in this post (the last bullet point in the last section). We did consider normalizing by the L2, ultimately we decided against that because the L2 indexes too heavily on the size of the majority of elements rather than indexing on the size of the largest elements, so it’s not really what we want. Fwiw I think normalizing by the L4 or the L_inf is more promising.
I agree it would be good for us to report more data on the pre-trained vs randomized thing specifically. I don’t really see that as a central claim of the paper so I didn’t prioritize putting a bunch of stuff for it in the appendices, but I might do a revision with more stats on that, and I really appreciate the suggestions.