Keenan Pepper comments on MATS Applications + Research Directions I’m Currently Excited About

Keenan Pepper 28 Feb 2025 0:29 UTC
1 point
0
- Improving our current techniques for using LLMs to interpret SAE latents
As far as you’re aware, is there any autointerp work that’s based on actively steering (boosting/suppressing) the latent to be labeled and generating completions, rather than searching a dataset for activating examples?
- Neel Nanda 28 Feb 2025 5:51 UTC
  3 points
  0
  Parent
  Probably is but I can’t think of anything immediately
- Keenan Pepper 28 Feb 2025 0:33 UTC
  1 point
  0
  Parent
  Hmm, there is a related thing called “intervention scoring” ( https://arxiv.org/abs/2410.13928 ) but this appears to be only for scoring the descriptions produced by the traditional method, not using interventions to generate the descriptions in the first place.