Hey Neel! I just wanted to say thank you for writing this. It’s honestly one of the most grounded and helpful takes I’ve seen in a while. I really appreciate your pragmatism, and the way you frame interpretability as an useful tool that still matters for early transformative systems (and real-world auditing!).
Quick question: do you plan to share more resources or thoughts on how interpretability can support black-box auditing and benchmarking for safety evaluations? I’m thinking a lot about this in the context of the General-Purpose AI Codes of Practice and how we can build technically grounded evaluations into policy frameworks.
Hey Neel! I just wanted to say thank you for writing this. It’s honestly one of the most grounded and helpful takes I’ve seen in a while. I really appreciate your pragmatism, and the way you frame interpretability as an useful tool that still matters for early transformative systems (and real-world auditing!).
Quick question: do you plan to share more resources or thoughts on how interpretability can support black-box auditing and benchmarking for safety evaluations? I’m thinking a lot about this in the context of the General-Purpose AI Codes of Practice and how we can build technically grounded evaluations into policy frameworks.
Thanks again!
Thanks, that’s very kind!
I don’t have any current plans, sorry