Update on Harvard AI Safety Team and MIT AI Alignment

Xander Davies, Sam Marks, kaivu, tlevin, leni, maxnadeau and Naomi Bashkansky

2 Dec 2022 0:56 UTC

60 points

We help organize the Harvard AI Safety Team (HAIST) and MIT AI Alignment (MAIA), and are excited about our groups and the progress we’ve made over the last semester.

In this post, we’ve attempted to think through what worked (and didn’t work!) for HAIST and MAIA, along with more details about what we’ve done and what our future plans are. We hope this is useful for the many other AI safety groups that exist or may soon exist, as well as for others thinking about how best to build community and excitement around working to reduce risks from advanced AI.

Important things that worked:

Well-targeted outreach, which (1) focused on the technically interesting parts of alignment (rather than its altruistic importance), and (2) leveraged informal connections with networks and friend groups.
HAIST office space, which was well-located and very useful for running HAIST’s programming and co-working.
Well-contextualized leadership, with many of the people involved in running HAIST/MAIA programming having experience with AI safety research (including nearly all of the facilitators for our reading groups).
High-quality, scalable weekly reading groups, including 13 sections of introductory reading groups, 2 science of deep learning reading groups, 2 policy reading groups, and general member reading groups for HAIST and MAIA.
Significant time expenditure, including mostly full-time attention from several organizers.

Important things we got wrong:

Poor retention for MAIA programming, perhaps due to starting this programming too late in the semester.
Excessive focus on intro programming, which cut against ML engineering programming and advanced reading groups for more seasoned members.

If you’re interested in supporting the alignment community in our area, the Cambridge Boston Alignment Initiative is currently hiring.

What we’ve been doing

HAIST and MAIA are concluding a 3-month period during which we expanded from one group of about 15 Harvard and MIT students who read AI alignment papers together once a week to two large student organizations that:

Ran a large AI safety intro fellowship (organized by Sam Marks and adapted from the AGI Safety Fundamentals program) that attracted over 230 applicants and enrolled about 130 in 13 weekly reading groups, all facilitated by people with experience in AI safety research (some of whom are students). About 60 participants have continued attending as of this post, including undergraduates, grad students, and postdocs in math and computer science.
- This wound up taking up most of our focus, which was not the original plan (we planned to spend more time on ML up-skilling and supporting research). This pivot was mostly intentional (we got a higher number of great applicants than expected), but we are worried about continually prioritizing our introductory program in the future (which we discuss further below).
Opened the HAIST office (with the significant help of Kaleem Ahmid, Frances Lorenz, and Madhu Sriram), which has become a vibrant coworking space for alignment research and related work. We plan to open a MAIA office, officially starting in February 2023.
Launched the MAIA/HAIST Research Fellows program (organized by Oam Patel), which paired 20 undergraduate and graduate students with AI safety and governance research mentors.
Started a Science of Deep Learning reading group at MAIA (organized by Eric Michaud), with around 8 active participants (more information about the reading group here). This program ended up being good for experienced member engagement and generating research ideas, but didn’t perform as well as an outreach mechanism (initial intention).
Ran two retreats (organized by Trevor Levin and Kuhan Jeyapragasan), with a total of 85 unique attendees, including many of our most engaged intro fellows, discussion group facilitators, research mentors, and guests from Redwood Research, OpenAI, Anthropic, Lightcone, Global Challenges Project, and Open Philanthropy. We think these retreats were unusually impactful (even compared to other retreats), with multiple participants at each indicating that they were significantly more likely to pursue careers in AI alignment research or related fields (governance/policy, outreach) after the retreat, and many expressing interest in and following up regarding continued involvement with (and in some cases organizing for) HAIST and MAIA.
Ran two weekly AI governance fellowships with 15 initial and 14 continuing participants.
Hosted Q&As with Daniel Ziegler, Tom Davidson, Chris Olah, Richard Ngo, and Daniel Kokotajlo.
Ran HAIST’s weekly member meetings, where we read alignment-relevant research (e.g. 1) and began MAIA member meetings.
Facilitated a debate on the risks and benefits of research on Reinforcement Learning from Human Feedback, and are working on producing an adversarial collaboration document (headed by Adam Jermyn) summarizing our debate.
Added weekly socials (organized by Naomi Bashkansky) hosted at the HAIST office where people new to alignment mingle with more experienced people.
Started an AI forecasting group, with talks, workshops with AI forecaster Tamay Besiroglu, and friendly competitions on Fermi estimations and pastcasting..
Are organizing an MLAB-inspired ML bootcamp in January 2023 in partnership with the Cambridge Boston Alignment Initiative, to which current students should apply by December 4th if they are interested in AI safety but have little ML experience.

What worked

Communication & Outreach Strategy

Outreach targeting the most promising students with technical backgrounds (and leveraging informal friend networks).
- Both HAIST and MAIA actively promoted our programs on the course sites/Slacks/mailing lists of relevant advanced CS and math classes, undergrad majors/graduate programs, and relevant student groups.
- Getting help on our outreach strategy from well-positioned members of these social networks (created in part through shared problem-set groups and friend groups with similar majors/extra-curricular/research interests) and asking them to recommend our programs to their peers, especially through direct messages.
Emphasizing the technical aspect and interestingness of AI alignment (over just its ethical importance). As we noted in our announcement post, we want AI safety to be motivated not just by mitigating existential risk or effective altruist considerations, but also as one of the most interesting, exciting, and important problems humanity faces. We continued pitching our programs primarily as ways to explore unique and interesting technical problems that also happen to be extremely important (rather than primarily as a means to social impact). We think this worked well and should be replicated. That being said, we think that having group members engaging with the impacts advanced AI could have, and implications for humanity are important, and addressed in our programs, social contexts, and other programming.
Good digital communications and copy. We (mostly Xander) put substantial effort into nailing the wording of our emails and Slack messages, including customizing them for different audiences, and we’re happy with what we wound up with. We also like our websites. If you’re doing similar outreach, feel free to reach out to Xander at xanderlaserdavies@gmail.com for resources and advice.
Special attention to the most engaged and skilled (in relevant domains) participants. We put participants with especially high combinations of engagement/interest with the ideas and technical skills in touch with top professionals and organizations. Chatting with professionals (1:1 chats, talks/Q&As at retreats, external connections) has often been cited as highly important for newer students getting more involved.

Operations

Active office in convenient location: Getting collective buy-in to use the office regularly (for default working, socializing, and AI safety programming), and investing effort into making the space fun and convenient to use helped improve programming, social events, and sense of community. We think the office facilitated many more interactions between group members (and with intro fellows) than would have occurred without it.
Smooth participant experience (through high-effort background organizing costs). We put effort into making the participant experience in programs strong—ensuring that discussions take place with the necessary materials (food, printed readings) in place, the rooms booked, the facilitators on time, and especially engaged participants followed up with. Starting, advertising, and running these programs, opening an office, and running two large retreats involved over a dozen organizers contributing >10 hours a week. (We know smaller groups will not have this kind of capacity, so we should note that we think it was important to make one or two of our core programs great before we significantly expanded.)

Pedagogy

Finding excellent, well-contextualized facilitators. All but one of the facilitators for our reading groups have done research on the topic of the group. Most were PhD students; some had finished PhDs or were otherwise full-time professional researchers; some had worked in relevant research groups, orgs, or labs. We think this increased the educational quality of the groups, improved discussions, and lent substantial credibility and professionalism to the programs. This probably resulted in part from confirming most facilitators over the summer.
Basing our intro fellowship on Richard Ngo’s AGISF curriculum. Though we made various adaptations (see below), we were very fortunate to have been starting from an extremely high-quality baseline. Any curriculum we had tried to make from scratch would likely have been significantly worse.
Putting special effort into selecting pedagogically-valuable reading materials for intro fellowships. Not all explanations of the same idea are equally reader-friendly, especially for people learning about alignment for the first time. For intro fellowship sections that met early in the week (before the other sections met to cover the same material), we tried to pay close attention to how they responded to the readings. When they weren’t getting much out of a reading we did our best to substitute it with a clearer write-up of the same topic (or sometimes, of a new topic altogether) before other intro fellows had. We sometimes also ran experiments, giving one reading to some sections and a different reading to another. We thought that the additional boost in reading quality helped keep participants engaged.
Having reading groups do readings in-meeting. A standard way to run reading groups is to ask participants to read materials outside of meetings, and then spend the entire meeting discussing. We have found that having longer meetings which provide time for eating and doing readings in-person result in much better reading comprehension (possibly because participants aren’t rushing to finish readings before the meeting) and much higher-quality discussion (since readings are fresh on participants’ minds). (A common concern, which is that longer meetings tire out participants more, seems to not have materialized, possibly because alternating between reading and discussion helps keep participants alert.) To this end, we adapted the AGISF curriculum (originally 7 weeks of 1.5-hour meetings) to span 10 weeks of two-hour meetings, with all readings done in-meeting.

After we incorporate a final round of participant feedback, we’ll release our final adaptation of the AGISF curriculum, structured as 9 weeks of two-hour meetings, and with various minor curricular substitutions.

Mistakes/Areas for Improvement

High attrition rates in our MIT programs. Our MIT programs had significantly higher attrition rates. We’re still figuring out why, but reasons might include lack of office space, a later start, lower rates of friend-groups taking part together, and an MIT-specific aversion to the reading group format, each of which we will try to fix next semester.
Insufficient focus on programming exciting for group members/organizers, and too much focus on intro-friendly programs. For example, MAIA neglected running advanced meetings for already engaged members at MIT until late in the term, which also hindered strong community formation we saw at Harvard. This was understandable given that it was fall semester and the groups were new, but we somewhat fell into the trap of trying to appeal to newer students at the expense of making group involvement fun for experienced students interested in alignment (especially at MIT).
Inadequate task management and organizational structure. Next semester, we’ll plan this out more to reduce organizer stress, ambiguity, redundant work, and communication costs (e.g. organizers not knowing who was doing what).
Lack of office space at MIT. MAIA suffered substantially from not having a physical office. Whereas almost all HAIST meetings took place in the office, making the operations easy and the atmosphere professional and legitimate, MAIA meetings were scattered throughout MIT classrooms.
Late launch of the Research Fellowship. The research program did not officially launch until late October (since we weren’t planning on running one until late), meaning most students did not do much research. We also didn’t invest enough effort into getting existing HAIST/MAIA members to work on research during term.
Taking too long to get to more technical material during our intro fellowship. For example, many intro fellows identified the material on recent interpretability work (e.g. circuits, causal tracing, and superposition) as their favorite part of the intro fellowship. But this material didn’t come until the 7th and 8th weeks, after many participants had already dropped out! (Other fan favorites include specification gaming and goal misgeneralization.) In redrafting the intro fellowship curriculum, we’re looking for ways to introduce this technical material sooner.

Next Steps/Future Plans

At this stage, we’re most focused on addressing mistakes and opportunities for improvement on existing programming (see above). Concretely, some of our near-term top priorities are:

Setting up office space for MAIA.
Setting up and sharing infrastructure and resources for AI alignment university programming with organizers at other universities (e.g., our technical and governance curricula).
Improving our programming for already engaged students (e.g. paper implementation groups, an Alignment 201 program, research opportunities, ML skill-building opportunities, etc)
Creating and sharing opportunities for extended engagement (strong overlap with the above), especially over winter and summer breaks.

How You Can Get Involved

Mentor + advise junior researchers/students (remotely and in person). Following this semester’s successes, we are likely to have many more junior members who are interested in and capable of helping with alignment and governance research than mentors to support them. Contact Xander or Kuhan to express interest (on the Forum, FB messenger, or email—xanderlaserdavies@gmail.com and kuhanjey@gmail.com).
Visit us, especially during retreats. Several retreat guests got very positive feedback from attendees and, we think, accelerated several careers. Other researchers who were not able to join for retreats did well-received Q&As at our office (see the Q&As bullet in the summary). Online talks + Q&As are also welcome.
Give us feedback, whether via the email addresses above, in the comments, or at EAGxBerkeley this weekend.
Apply to our MLAB-inspired ML bootcamp in January 2023 in partnership with the Cambridge Boston Alignment Initiative.
If you’re interested in supporting the alignment community in our area, the Cambridge Boston Alignment Initiative is currently hiring.

What links here?