Public Call for Interest in Mathematical Alignment
Bottom line up front:
If you are currently working on, or are interested working in any area of mathematical AI alignment, we are collecting names and basic contact information to find who to talk to about opportunities in these areas. If that describes you, please fill out the form! (Please do so even if you think I already know who you are, or people will be left out!)
More information
There are several concrete research agendas in mathematical AI alignment, receiving varying degrees of ongoing attention, with relevance to different possible strategies for AI alignment. These include MIRI’s agent foundations and related work, Learning Theoretic Alignment, Developmental Interpretability, Paul Christiano’s theoretical work, RL theory related work done at Far.AI, FOCAL at CMU, Davidad’s “Open Agency” architecture, as well as other work. Currently, as in the past, work in these areas has been conducted mainly in non-academic settings, often not published, and the people involved are scattered—as are other people who want to work on this research.
A group of people, including some individuals at MIRI, Timaeus, MATS, ALTER, PIBBSS, and elsewhere, are hoping to both promote research in these areas, and build bridges between academic and existing independent research. To that end, we are hoping to promote academic conferences, hold or sponsor attendance at research seminars, and announce opportunities and openings for PhD students or postdocs, non-academic positions doing alignment research, and similar.
As a first step, we want to compile a list of people who are (at least tentatively) interested, and would be happy to hear about projects. This list will not be public, and is likely to involve very few emails to this list, but will be used to find individuals who might want to be invited to programs or opportunities.
Note that we are interested in people at all levels of seniority, including graduate students, independent researchers, professors, research groups, university department contacts, and others who wish to be informed about future opportunities and programs.
Interested in collaborating?
If you are an academic, or are otherwise more specifically interested in building bridges to academia or collaborating with people in these areas, please mention that in the notes, and we are happy to be in touch with you, or help you contact others working in more narrow areas you are interested in.
If I imagine being an undergraduate student who’s interested, then this sentence leaves me unclear on whether I should fill it out.
We are focused on mathematical research and building bridges between academia and research. I think the pathway to doing that type of research is usually through traditional academic channels, a PhD program, or perhaps a masters degree or a program like MATS, at which point the type of research promotion and academic bridge building we are focused on become far more relevant. That said, we do have undergrad as an option, and are certainly OK with people at any level of seniority signaling their interest.
For my own clarity: What is the difference between mathematical approaches to alignment and other technical approaches like mechanistic interpretability work?
I imagine the focus is on in principal arguments or proofs regarding the capabilities of a given system rather than empirical or behavioural analysis but you mention RL so just wanted to get some colour on this.
Any clarification here would be helpful!
You are more or less right. By “mathematical approaches”, we mean approaches focused on building mathematical models relevant to alignment/agency/learning and finding non-trivial theorems (or at least conjectures) about these models. I’m not sure what the word “but” is doing in “but you mention RL”: there is a rich literature of mathematical inquiry into RL. For a few examples, see everything under the bullet “reinforcement learning theory” in the LTA reading list.
Thanks for the pointer! Yes RL has a lot of research of this kind—as an empirical research I just get stuck sometimes in translation
Don’t forget Orthogonal’s mathematical alignment research, including QACI!
Thanks—and the fact that we don’t know who is working on relevant things is exactly the reason we’re doing this!
Part of ACS research directions fits into this—Hierarchical Agency, Active Inference based pointers to what alignmnent means, Self-unalignment
Thanks for setting this up!