Mechanistic Interpretability Reading group

Greetings.

This is an open invitation for folks interested in technical AI alignment literature to join our reading group, which has been running for 22 weeks now. Our meetings take place online via the Mechanistic Interpretability Group Discord server, accessible through this invite link.

Each week 4-5 papers are shortlisted and polled, occasionally with some central theme, and people vote on which papers they’d like to read and discuss. The most popular option is selected for the next week’s reading. You can find a list of the papers we’ve already read as well as some of the upcoming papers at https://mechinterp.com/reading-group.

In addition to paper discussions, we occasionally host authors for Q&A sessions. This week, we will be reading Studying Large Language Model Generalization with Influence Functions (arxiv). One of the authors, Juhan Bae, will be joining us for a Q&A session on September 27th at 5pm GMT.

The usual meeting time is Wednesdays at 5pm GMT (10am PT) and the sessions usually run for an hour. If you’re unavailable at that time, we’ve recently started recording some of our sessions, transcribing them with Whisper. The recordings and transcripts are available here.

We welcome anyone interested to join us in our discussions. Feel free to reach out if you have any questions or need further information. The best way to reach us is through the Discord server.