Nora Belrose comments on Interpreting Neural Networks through the Polytope Lens

Nora Belrose 13 Oct 2022 4:24 UTC
1 point
0
Do you have any recommendations for running HDBSCAN efficiently on high dimensional neural net activations? I’m using the Python implementation and just running the algorithm on GPT-2 small’s embedding matrix is unbearably slow.
UPDATE: The maintainer of the repo says it’s inadvisable to use the algorithm (or any other density-based clustering) directly on data with as many as 768 dimensions, and recommends using UMAP first. Is that what you did?
- Dan Braun 13 Oct 2022 12:53 UTC
  2 points
  1
  Parent
  Hi Nora. We used rapidsai’s cuml which has GPU compatibility. Beware, the only “metric” available is “euclidean”, despite what the docs say (issue).
  - Nora Belrose 13 Oct 2022 20:03 UTC
    2 points
    0
    Parent
    Oh cool this will be really useful, thanks!