I think this is a relatively interesting idea as a stand-alone thing (i.e. running more week-long research sprints). This seems good and like it would be pretty useful in the AI safety community, and is underserved at the moment.
I don’t really think the framing of this as an alternative to ARENA makes much sense. Because ARENA’s main bottleneck to scaling hasn’t really been TAs. I mean it’s a bottleneck, don’t get me wrong, but I wouldn’t call it the main one. I also think having a less-structured, more research-driven model would probably require more TA involvement? If you wanted to do it well and have it be accessible to early-stage people, at least.
I’m confused about the evidence given that ARENA functions primarily as a signaling mechanism. I think that is certainly an aspect of ARENA (as with MATS, as with the completion of any high-quality program). But the fact that some people have done AI safety research before ARENA is weak evidence of this to me (I could go into more detail about this in private, but not willing to do so publicly since it pertains to ARENA’s selection process which I do not want to be gamed).
The fact that people from compressed versions of ARENA (ARBOx, which is a great program!) also go on to do great things doesn’t seem like evidence of this to me. In fact, this seems like evidence that completing a structured curriculum just makes you a better applicant to other programmes. Not sure how this is being interpreted (since this depends on what you perceive to be the signaling value of ARBOx, which I think is probably not as high as ARENA’s. I think it should be higher because it’s a great programme!)
Also we do put out impact reports where people self-assess as having improved their ability to do certain concrete tasks that we think are valuable in research. Won’t go into it in detail here because we’ve done so in impact reports in the past. E.g. The report for ARENA 6 is here https://www.lesswrong.com/posts/WiergDX4ufcLbvwzs/arena-6-0-impact-report (Side note that I would love to see more reports like this from other programs in the ecosystem, and on a more regular basis).
I have more thoughts but will leave it there. Think this project is probably worth pursuing independently (although if you’re getting early-stage people to do research-sprints, as I said earlier, I think you do need high-quality and involved mentorship that looks similar to TAs). Also think there’s a lot of people doing somewhat similar things, but maybe not quite 1-week sprints.
Thanks, James, for the detailed thoughts and for reading through the post. I’ll respond once here. If we want further back and forth, better to have a chat in private so we can iron out our cruxes (and then summarize for community benefit). I’d also want to hear what others in community think before committing to anything.
> Because ARENA’s main bottleneck to scaling hasn’t really been TAs
I am happy to defer to you regarding the scaling bottlenecks of Arena. That’s not a big crux for the proposal.
> I’m confused about the evidence given that ARENA functions primarily as a signaling mechanism
Maybe the word signaling isn’t correct. Let me try to explain. When I point out that there are four people who did ARBOx and are now doing elite fellowship programs, my hunch is that those four had a very good chance of getting into those elite programs, even if they hadn’t done ARBOx. Furthermore, if ARBOx did provide a significant boost to their profile/skillset, then one needs to consider how much extra value the extra three weeks at ARENA are providing. Another way of saying this is that ARBOx and Arena and these elite programs have similar selection processes. And so Arena or ARBOx accepting someone is strongly correlated with the fact that they have high potential for future AI safety research, regardless of how much value they add on top.
> I also think having a less-structured, more research-driven model would probably require more TA involvement? If you wanted to do it well and have it be accessible to early-stage people, at least.
I do not consider participants of ARENA to be ‘early stage’. In my mind they are mid-stage (i.e. middle of upskilling towards a full-time researcher role) and most participants would be able to do solid research sprints without having gone through ARENA. My proposal is based on helping such mid-stage researchers. I think something like BlueDot (at least, BlueDot in 2024, I dont know about current BlueDot) or AISC targets early-stage researchers.
> Also we do put out impact reports where people self-assess as having improved their ability
My claim (which I have not really justfied, except to defer to Neel Nanda’s post) is that the counter-factual of doing four mini research sprints would be significantly higher impact. This could be the central crux.
> Side note that I would love to see more reports like this from other programs in the ecosystem, and on a more regular basis
100%. Thanks for doing this and being a role model!
Sure, I agree with most of that. I think this is probably mostly based on counterfactuals being hard to measure, in two senses:
The first is the counterfactual where participants aren’t selected for ARENA, do they then go on to do good things. We’ve taken a look at this (unpublished) and found that for people who are on the margin attendance at ARENA has an effect. But then that effect could be explained by signaling value. It’s basically difficult to say. This is why we try and do start-of-program and end-of-program surveys to measure this. But different viewpoints are available here because it is difficult to measure definitively.
The second is the counterfactual where people spend 4 weeks doing research sprints. I basically do expect that to be more effective if you require the ARENA materials as prerequisites, but I think it would then be hard to actually get applicants to such a programme (since people generally struggle to work through ARENA materials themselves). But maybe something else could work here. I actually kind of expect the counterfactual of that to be pretty low due to margin-based reasoning, where there exist many research-oriented programmes already, but relatively fewer upskilling-oriented programmes. But again, difficult to know definitively what’s more valuable on current margins (though I do think on current margins is the relevant question).
The first is the counterfactual where participants aren’t selected for ARENA, do they then go on to do good things
This is not crux for me. I believe ARENA provides counter-factual value compared to not doing ARENA. You work much harder during ARENA than you otherwise would, in great environment, great support, etc.
> The second is the counterfactual where people spend 4 weeks doing research sprints.
This is crux. And agreed it is hard to measure!
Thanks for engaging thoughtfully. Useful to think things through.
I think this is a relatively interesting idea as a stand-alone thing (i.e. running more week-long research sprints). This seems good and like it would be pretty useful in the AI safety community, and is underserved at the moment.
I don’t really think the framing of this as an alternative to ARENA makes much sense. Because ARENA’s main bottleneck to scaling hasn’t really been TAs. I mean it’s a bottleneck, don’t get me wrong, but I wouldn’t call it the main one. I also think having a less-structured, more research-driven model would probably require more TA involvement? If you wanted to do it well and have it be accessible to early-stage people, at least.
I’m confused about the evidence given that ARENA functions primarily as a signaling mechanism. I think that is certainly an aspect of ARENA (as with MATS, as with the completion of any high-quality program). But the fact that some people have done AI safety research before ARENA is weak evidence of this to me (I could go into more detail about this in private, but not willing to do so publicly since it pertains to ARENA’s selection process which I do not want to be gamed).
The fact that people from compressed versions of ARENA (ARBOx, which is a great program!) also go on to do great things doesn’t seem like evidence of this to me. In fact, this seems like evidence that completing a structured curriculum just makes you a better applicant to other programmes. Not sure how this is being interpreted (since this depends on what you perceive to be the signaling value of ARBOx, which I think is probably not as high as ARENA’s. I think it should be higher because it’s a great programme!)
Also we do put out impact reports where people self-assess as having improved their ability to do certain concrete tasks that we think are valuable in research. Won’t go into it in detail here because we’ve done so in impact reports in the past. E.g. The report for ARENA 6 is here https://www.lesswrong.com/posts/WiergDX4ufcLbvwzs/arena-6-0-impact-report (Side note that I would love to see more reports like this from other programs in the ecosystem, and on a more regular basis).
I have more thoughts but will leave it there. Think this project is probably worth pursuing independently (although if you’re getting early-stage people to do research-sprints, as I said earlier, I think you do need high-quality and involved mentorship that looks similar to TAs). Also think there’s a lot of people doing somewhat similar things, but maybe not quite 1-week sprints.
Thanks, James, for the detailed thoughts and for reading through the post. I’ll respond once here. If we want further back and forth, better to have a chat in private so we can iron out our cruxes (and then summarize for community benefit). I’d also want to hear what others in community think before committing to anything.
> Because ARENA’s main bottleneck to scaling hasn’t really been TAs
I am happy to defer to you regarding the scaling bottlenecks of Arena. That’s not a big crux for the proposal.
> I’m confused about the evidence given that ARENA functions primarily as a signaling mechanism
Maybe the word signaling isn’t correct. Let me try to explain. When I point out that there are four people who did ARBOx and are now doing elite fellowship programs, my hunch is that those four had a very good chance of getting into those elite programs, even if they hadn’t done ARBOx. Furthermore, if ARBOx did provide a significant boost to their profile/skillset, then one needs to consider how much extra value the extra three weeks at ARENA are providing. Another way of saying this is that ARBOx and Arena and these elite programs have similar selection processes. And so Arena or ARBOx accepting someone is strongly correlated with the fact that they have high potential for future AI safety research, regardless of how much value they add on top.
> I also think having a less-structured, more research-driven model would probably require more TA involvement? If you wanted to do it well and have it be accessible to early-stage people, at least.
I do not consider participants of ARENA to be ‘early stage’. In my mind they are mid-stage (i.e. middle of upskilling towards a full-time researcher role) and most participants would be able to do solid research sprints without having gone through ARENA. My proposal is based on helping such mid-stage researchers. I think something like BlueDot (at least, BlueDot in 2024, I dont know about current BlueDot) or AISC targets early-stage researchers.
> Also we do put out impact reports where people self-assess as having improved their ability
My claim (which I have not really justfied, except to defer to Neel Nanda’s post) is that the counter-factual of doing four mini research sprints would be significantly higher impact. This could be the central crux.
> Side note that I would love to see more reports like this from other programs in the ecosystem, and on a more regular basis
100%. Thanks for doing this and being a role model!
Sure, I agree with most of that. I think this is probably mostly based on counterfactuals being hard to measure, in two senses:
The first is the counterfactual where participants aren’t selected for ARENA, do they then go on to do good things. We’ve taken a look at this (unpublished) and found that for people who are on the margin attendance at ARENA has an effect. But then that effect could be explained by signaling value. It’s basically difficult to say. This is why we try and do start-of-program and end-of-program surveys to measure this. But different viewpoints are available here because it is difficult to measure definitively.
The second is the counterfactual where people spend 4 weeks doing research sprints. I basically do expect that to be more effective if you require the ARENA materials as prerequisites, but I think it would then be hard to actually get applicants to such a programme (since people generally struggle to work through ARENA materials themselves). But maybe something else could work here. I actually kind of expect the counterfactual of that to be pretty low due to margin-based reasoning, where there exist many research-oriented programmes already, but relatively fewer upskilling-oriented programmes. But again, difficult to know definitively what’s more valuable on current margins (though I do think on current margins is the relevant question).
My guess is these are the two cruxes? But unsure.
This is not crux for me. I believe ARENA provides counter-factual value compared to not doing ARENA. You work much harder during ARENA than you otherwise would, in great environment, great support, etc.
> The second is the counterfactual where people spend 4 weeks doing research sprints.
This is crux. And agreed it is hard to measure!
Thanks for engaging thoughtfully. Useful to think things through.