For people looking for MATS-like programs in other locations, with different timelines, etc. this page is a great resource for finding other training programs, a number of which (PIBBSS, Pivotal, LASR Labs, others) include mech interp research: https://www.aisafety.com/map
Matthew Shinkle
Karma: 34
Automating AI Safety: What we can do today
Hello! Long-time lurker, planning to post research results on here in the near future. I’m a currently a PIBBSS research fellow, working on LLM interpretability relating to activation plateaus and deception probes. I’ll be joining Anna Leshinskaya’s Relational Cognition lab in the fall as a postdoc, working on moral reasoning in LLMs. Feel free to reach out if you have any ideas, questions, etc. on any of these topics!
PIBBSS definitely does some mech interp, and I believe AI safety camp has some mech interp projects.