PraneetNeuro comments on Bridging the VLM and mech interp communities for multimodal interpretability

PraneetNeuro 28 Oct 2024 21:01 UTC
3 points
0
I’d be keen to see the TEXTSPAN method applied to the attention heads of CLIP’s text encoder
It’d also be interesting to see the same applied to the audio encoder of CLAP. Really curious to know what your thoughts are about mech interp efforts in the audio space. It seems to be largely ignored.
P.S : Thank you for the excellent post.