Open Call for Research Assistants in Developmental Interpretability
This is a chance to gain expertise in interpretability, develop your skills as a researcher, build out a network of collaborators and mentors, publish in major conferences, and open a path towards future opportunities, including potential permanent roles, recommendations, and successive collaborations.
Developmental interpretability is a research agenda aiming to build tools for detecting, locating, and understanding phase transitions in learning dynamics of neural networks. It draws on techniques from singular learning theory, mechanistic interpretability, statistical physics, and developmental biology.
Title: Research Assistant.
Location: Remote, with hubs in Melbourne and London.
Duration: Until March 2024 (at minimum).
Compensation: base salary is USD$35k per year, to be paid out as an independent contractor at an hourly rate.
Application Deadline: September 15th, 2023
Ideal Start Date: October 2023
How to Apply: Complete the application form by the deadline. Further information on the application process will be provided in the form.
Who We Are
The developmental interpretability research team consists of experts across a number of areas of mathematics, physics, statistics and AI safety. The principal researchers:
Daniel Murfet, mathematician and SLT expert, University of Melbourne.
Susan Wei, statistician and SLT expert, University of Melbourne.
Jesse Hoogland, MSc. Physics, SERI MATS scholar, RA in Krueger lab
We have a range of projects currently underway, led by one of these principal researchers and involving a number of other PhD and MSc students from the University of Melbourne and collaborators from around the world. In an organizational capacity you would also interact with Alexander Oldenziel and Stan van Wingerden.
You can find us and the broader DevInterp research community on our Discord. Beyond the Developmental Interpretability research agenda, you can read our first preprint on scalable SLT invariants and check out the lectures from the SLT & Alignment summit.
Overview of Projects
Here’s the selection of the projects underway, some of which you would be expected to contribute to. These tend to be on the more experimental side:
Developing scalable estimates for SLT invariants: Invariants like the (local) learning coefficient and (local) singular fluctuation can signal the presence of “hidden” phase transitions. Improving these techniques can help us better identify these transitions.
DevInterp of vision models: To what extent do the kinds of circuits studied in the original circuits thread emerge through phase transitions?
DevInterp of program synthesis: In examples where we know there is rich compositional structure, can we see it in the singularities? Practically, this means studying settings like modular arithmetic (grokking), multitask sparse parity, and more complex variants.
DevInterp of in-context learning & induction heads: Is the development of induction heads a proper phase transition in the language of SLT? More ambitiously, can we apply singular learning theory to study in-context learning and make sense of “in-context phase transitions.”
DevInterp of language models: Can we detect phase transitions in simple language models (like TinyStories). Can we, from these transitions, discover circuit structure? Can we extend these techniques to larger models (e.g., in the Pythia suite).
DevInterp of reinforcement learning models: To what extent are phase transitions involved in the emergence of a world model (in examples like OthelloGPT)?
Next to these, we are working on a number of more theoretical projects. (Though our focus is on the more applied projects, if one of these particularly excites you, you should definitely apply!)
Studying phase transitions in simple models like deep linear networks: These serve a valuable intermediate case between regular models and highly singular models. This is also a place to draw connections to a lot of existing literature on deep learning theory (on “saddle-to-saddle dynamics”).
Developing “geometry probes”: It’s not enough to detect phase transitions, but we have to be able to analyze them for structural information. Here the aim is to develop “probes” that extract this kind of information from phase transitions.
Developing the “geometry of program synthesis”: We expect that understanding neural networks will require looking beyond models of computation like Turing machines, lambda calculus, and linear logic, towards geometric models of computation. This means pushing further in directions like those explored by Tom Waring in his MSc. thesis.
Taken together these projects complete the scoping phase of the DevInterp research agenda, ideally resulting in publications in venues like ICML and NeurIPS.
What We Expect
You will be communicating about your research on the DevInterp Discord, writing code and training models, attending research meetings over Zoom, and in general acting as a productive contributor to a fast-moving research team combining both theoreticians and experimentalists working together to define the future of the science of interpretability.
Depending on interest and background, you may also be reading and discussing papers from ML or mathematics and contributing to the writing of papers on Overleaf. It’s not mandatory, but you would be invited to join in virtual research seminars like the SLT seminar at metauni or SLT reading group on the DevInterp Discord.
There will be a DevInterp conference in November 2023 in Oxford, United Kingdom, and it would be great if you could attend. There will hopefully be a second opportunity to meet the team in person between November and the end of the employment period (possibly in Melbourne, Australia).
Who is this for?
We’re looking mainly for people who can do engineering work, that is, people with software development and ML skills. It’s not necessary to have a background in interpretability or AI safety, although that’s a plus. Ideally you have legible output / projects that demonstrate ability as an experimentalist.
What’s the time commitment?
We’re looking mainly for people who can commit full-time, but if you’re talented and only available part-time, don’t shy away from applying.
What does the compensation mean?
We’ve budgeted USD$70k in total to be spread across 1-4 research assistants over the next half year. By default we’re expecting to pay RAs USD$17.50/hour.
Do I need to be familiar with SLT and AI alignment?
No (though it’s obviously a plus).
We’re leaning towards taking on skilled general purpose experimentalists (without any knowledge of SLT) over less experienced programmers who know some SLT. That said, if you are a talented theorist, don’t shy away from applying.
What about compute?
In the current phase of the research agenda the projects are not extremely compute intensive, the necessary cloud GPU access will be provided.
What are you waiting for?