A Proposal for a Better ARENA: Shifting from Teaching to Research Sprints

TLDR

I propose restructuring the current ARENA program, which primarily focuses on contained exercises, into a more scalable and research-engineering-focused model consisting of four one-week research sprints preceded by a dedicated “Week Zero” of fundamental research engineering training. The primary reasons are:

  • The bottleneck for creating good AI safety researchers isn’t the kind of knowledge contained in the ARENA notebooks, but the hands-on research engineering and research skills involved in day-to-day research.

  • I think the current version of ARENA primarily functions as a signaling mechanism in the current state of the AI safety ecosystem.

[Edit: as discussed in the comments, on reflection the scalability is not a primary issue or benefit.]

Context and disclaimers

  • This post was written using Superwhisper and then asking Gemini to transcribe into a blog post format. I have done some light editing. Some of this might look like AI slop. I apologize, but I think the value of this post is pretty good as is, and it is not a good use of my time to refine it further.

  • I am not saying that Arena is not valuable. Arena is obviously valuable, and deserves the high reputation it has in the AI safety ecosystem.

  • Why am I well positioned to think about this? In the past year and a half, I have participated in a large slew of AI safety schemes, both as a participant and as a teacher or lead. This includes ML4Good, both as a participant and as TA, SPAR as a participant, AI Safety Camp as a project lead, ARENA as a participant and as a TA, Algoverse both as a mentor and as a participant, BlueDot both as a participant and a facilitator, ARBOx as a TA. Furthermore, I am currently a research manager at MATS so I’m getting a close-up view of what skills are required to do high-quality AI safety research.

  • The views expressed here are my own and do not necessarily reflect the views of MATS.

The Core Problem with the Current ARENA

My primary concern is that the skills learned in the current ARENA program are not the bottleneck for the AI Safety ecosystem.

  • Skills Mismatch: AI safety research involves self-directed coding (with LLMs), making decisions about experimental design, setting up infrastructure, research taste, etc.. In contrast, ARENA exercises are typically small, well-contained, and have a black-and-white correct answer with pre-provided unit tests, removing the crucial element of uncertainty and decision-making present in real research.

  • Signaling vs. Upskilling: Based on my experience, the biggest benefit of the current program to the AI Safety community appears to be as a signaling mechanism for other programs. Two pieces of evidence. One is that many participants at ARENA have already done AI Safety research before participating. Second evidence is that at least four ARBOx (a 2-week compressed version of ARENA) are doing elite AI safety fellowships (1 Anthopic Fellows Program, 2 LASR Labs, 1 MATS).

  • Scalability Bottleneck: ARENA is fundamentally not scalable due to its reliance on TAs and the hands-on teaching model. MATS, for example, is scaling much faster (200+ people/​year) compared to ARENA (approx. 75 people/​year at 25 participants/​cohort, three times/​year).

The Proposed Research Sprint Format

The alternative program structure would be a four-week sequence of mini-research sprints, with each week having a different AI safety theme, plus an introductory Week Zero. This aligns with the advice from researchers like Neel Nanda on upskilling in mechanistic interpretability—study the relevant material, then start mini-sprints.

Application Process: ARENA Knowledge as a Prerequisite

The content of the existing ARENA notebooks could be a prerequisite for the new program.

  • Automated Testing: The application process would involve a test of familiarity with the content, possibly using automated quizzes or Anki flashcards created by the Arena team. This removes the marginal value provided by TAs (which will only diminish as LLMs improve at explanation) and frees up staff time.

  • Standard Selection: Other standard selection criteria used by programs like SPAR, Algoverse, and AI Safety Camp would still apply.

Program Structure

Week

Theme/​Focus

Goal

Week ZeroResearch Engineering & Soft SkillsDedicated training on modern tools, workflows, and non-technical skills.
Week 1Mech Interp SprintFocused one-week research project.
Week 2Evals SprintFocused one-week research project.
Week 3Fine-Tuning/​RL Model Organisms SprintFocused one-week research project.
Week 4Choice/​Software Engineering SprintParticipants choose a deep-dive topic or contribute to open-source packages.

Week Zero: Dedicated Training

The goal for this (optional) week is to teach the actual skills needed for research.

  • Engineering Workflows: Go over fundamental modern research engineering skills. This includes setting up infrastructure, estimating GPU needs, making use of LLMs for coding (e.g., ClaudeCode), sandboxing with Docker, etc. One version of this includes just spending a day or two going through all the (relevant) tips in Tips and Code for Empirical AI Safety Research.

  • Broader Skills: Teach skills essential for a successful researcher that often get neglected:

    • Theory of impact exercises and AI Safety strategy

    • Project management frameworks

    • Reflection structures (e.g., what went well that day)

    • Applied rationality exercises

    • Collaboration and conflict resolution

The Software Engineering Week

A potential alternative for Week 4 is a pure Software Engineering Week, where participants contribute to open-source packages in collaboration with open-source maintainers. This is an excellent way to teach hard software engineering skills and build up “taste” for good software, which is a growing concern with the rise of LLM coding.

Partnership and Mentoring

To maximize value, ARENA could partner with research programs like MATS.

  • One-Week Mentors: Existing participants from the partner programs (e.g., MATS) would serve as one-week project mentors during the themed sprints.

  • Benefits for Mentors: This provides low-stakes, time-boxed practice for mentoring, which is a hard skill to learn—a significant advantage over a three-month-long project lead role (which I had in AI Safety Camp and found challenging).

  • Benefits for Participants: Increased networking, access to active researchers, direct guidance, and more relevant research directions set by the mentors.

ML4Good best practices

Any new structure should embed the good practices of programs like ML4Good to create a positive learning environment, a sense of community, and a safe space for both personal and technical growth. For details, see my post about it.

Scalability

[Edit: I no longer think this is an important or defining feature.]

The new model is significantly easier to scale:

  1. Ditching TAs/​Teaching: Eliminates a primary bottleneck for the current program.

  2. Flexible Structure: The format is fundamentally flexible; anyone can pick it up and run a mini-sprint. The core ARENA team can provide structures and instructions to enable bottom-up organization globally. For example, can imagine something like “one week Mech Interp sprint at this university, application is 30 questions based on 4 notebooks, 80% pass rate”).

  3. Online Feasibility: There is no fundamental reason this cannot be done online, similar to how Apart runs global hackathons.

Potential Downsides

[Edit: the strongest downsides have been suggested by commenters. Humans still got an edge over AI.]

One potential downside is the reduced incentive for the ARENA team to create new ARENA-style notebooks (e.g., for control research). However, since the team is already heavily bottlenecked on time for new notebook development, this might not be a real disadvantage. Both systems suffer from the same staffing problem.

Another downside is the implication that this has to replace ARENA. This could just be a separate parallel initiative. However, I do actually believe that the ARENA team and ARENA participants are better served moving more to a model I am suggesting.

I am actually struggling to think of downsides. I asked Gemini and here are its thoughts along with my counters:

  • Exclusion of true beginners/​high barrier to entry.

    • ARENA already has high barrier to entry.

  • Risk of superficial projects and high failure rate.

    • Primary goal is upskilling. High failure rates are normal in research anyway

  • Inadequate mentoring depth in only 1 week

    • Primary aim isn’t to provide deep mentoring. Furthermore, if the mentor-mentee relationship is positive during the one week, there is nothing stopping them collaborating in the long run.

  • Gaming the automated prerequisite system

    • Yes that is a risk. But not a fundamental flaw of the idea. Furthermore, I am hoping that selection effects play a smaller role in this new format (where gaming of entry requirements matters most I think), because there are concrete outputs that can instead be judged. E.g. if somebody cheats there way in but produces bad outputs, then people running other programs should be wary. In other words, if people say they attended this program, they should be obliged to share their outputs to help other programs evaluate them properly.

  • Dilution of brand, if people globally run ‘bottom-up’ versions

    • This is already solved problem. Only the ARENA team can use the official brand, and any people who run versions of the ARENA program independently are obliged to acknowledge ARENA and obliged to make explicit they are not running an official ARENA program.

I asked Claude to review this post and it came up with some other downsides. Again, Claude’s comments followed by mine.

  • What if a one-week sprint produces low-quality research that hurts participants’ confidence?

    • This is part of learning experience, and we should have structures and advice in place to help deal with this.

  • What if rapid churn of different themes prevents deep learning?

    • This matches the pace of ARENA

  • What if removing the “ARENA experience” community-building aspect (by going online/​distributed) reduces downstream networking value?

    • I am not suggesting removing the in-person experience.

Making It Happen

If you think this is a good idea, then the obvious question is how do we make this happen? Unfortunately, I probably don’t have the time to make this happen, but I’d definitely like to be involved. Possible next steps include:

  • Forming a core organizing group and writing a grant to CG. [Interesting side note: Gemini hallucinated here and instead recommended an ACX Grant, which I do not provide in its context. But my instructions to Gemini did mention I wanted a LessWrong post].

  • Trying a pilot iteration on a low-cost basis, such as at the EA Hotel.

  • Fleshing out details. I know many of the ideas above are preliminary, but there’s enough of a skeleton to get things going.

If you have any feedback or want to get involved, please share in the comments.