Someone should do the obvious experiments and replications.
Ryan Greenblatt recently posted three technical blog posts reporting on interesting experimental results. One of them demonstrated that recent LLMs can make use of filler tokens to improve their performance; another attempted to measure the time horizon of LLMs not using CoT; and the third demonstrated recent LLMs’ ability to do 2-hop and 3-hop reasoning.
I think all three of these experiments led to interesting results and improved our understanding of LLM capabilities in an important safety-relevant area (reasoning without visible traces), and I’m very happy Ryan did them.
I also think all three experiments look pretty obvious in hindsight. LLMs not being able to use filler tokens and having trouble with 2-hop reasoning were both famous results that already lived in my head as important pieces of information about what LLMs can do without visible reasoning traces. As far as I can tell, Ryan’s two posts simply try to replicate these two famous observations on more recent LLMs. The post on measuring no CoT time horizon is not a replication, but also doesn’t feel like a ground-breaking idea once the concept of increasing time horizons is already known.
My understanding is that the technical execution of these experiments wasn’t especially difficult either, in particular they didn’t require any specific machine learning expertise. (I might be wrong here, and I wonder how many hours Ryan spent on these experiments. I also wonder about the compute budget of these experiments, I don’t have a great estimate of that.)
I think it’s not good that these experiments were only run now, and that they needed to run by Ryan, one of the leading AI safety researchers. Possibly I’m underestimating the difficulty of coming up with these experiments and running them, but I think ideally these should have been done by a MATS scholar, or ideally by an eager beginner on a career transitioning grant who wants to demonstrate their abilities so they can get into MATS later.
Before accepting my current job, I was thinking about returning to Hungary and starting a small org with some old friends who have more coding experience, living on Eastern European salaries, and just churning out one simple experiment after another. One of the primary things I hoped to do with this org was to go through famous old results and try to replicate them. I hope we would have done the filler tokens and 2-hop reasoning replications too. I also had many half-baked ideas of running new simple experiments investigating ideas related to other famous results (in the way the no-CoT time horizon experiment is one possible interesting thing to investigate related to rising time horizons).
I eventually ended up doing something else, and I think my current job is probably a more important thing for me to do than trying to run the simple experiments org. But if someone is more excited about technical research than me, I think they should seriously consider doing this. I think funding could probably be found, and there are many new people who want to get into AI safety research; I think one could turn these resources into churning out a lot of replications and variations on old research, and produce interesting results. (And it could be a valuable learning experience for the AI safety beginners involved in doing the work.)
> If there was an org devoted to attempting to replicate important papers relevant to AI safety, I’d probably donate at least $100k to it this year, fwiw, and perhaps more on subsequent years depending on situation. Seems like an important institution to have. (This is not a promise ofc, I’d want to make sure the people knew what they were doing etc., but yeah)
but I think ideally these should have been done by a MATS scholar, or ideally by an eager beginner on a career transitioning grant who wants to demonstrate their abilities so they can get into MATS later.
A problem here is that, I believe, this is on the face of it not quite aligned with MATS scholars’ career incentives, as replicating existing research does not feel like projects that would really advance their prospects of getting hired. At least when I was involved in hiring, I would not have counted this as strong evidence or training for strong research skills (sorry for being part of the problem). On the other hand, it is totally plausible to incorporate replication of existing research as part of a larger research program investigating related issues (i.e. Ryan’s experiment about time horizon without COTs could fit well within a larger work investigating time horizons in general).
This may look different for the “eager beginners”, or something like the AI safety camp could be a good venue for pure replications.
Interesting. My guess would have been the opposite. Ryan’s three posts all received around 150 karmas and were generally well-received, I think a post like this would be considered 90th percentile success for a MATS project. But admittedly, I’m not very calibrated about current MATS projects. It’s also possible that Ryan has good enough intuitions to have picked two replications that are likely to yield interesting results, while a less skillfully chosen replication would be more likely to just show “yep, the phenomenon observed in the old paper is still true”. That would be less successful but I don’t know how it would compare in terms of prestige to the usual MATS projects. (My wild guess is that it would still be around median, but I really don’t know.)
(adding my takes in case they are useful for MATS fellows deciding what to do) I have seen many MATS projects via attending the MATS symposiums, but am relying on my memory of them. I would probably consider Ryan’s posts to each be like 60-70th percentile MATS project. But I expect that a strong MATS scholar could do 2-5 mini projects like this during the duration of MATS.
Before accepting my current job, I was thinking about returning to Hungary and starting a small org with some old friends who have more coding experience, living on Eastern European salaries, and just churning out one simple experiment after another.
I think such an org should focus on automating simple safety research and paper replications with coding agents (e.g. Claude Code). My guess is that the models aren’t capable enough yet to autonomously do Ryan’s experiments but they may be in a generation or two, and working on this early seems valuable.
The Coefficient technical grant-making team should pitch some people on doing this and just Make It Happen (although I’m obviously ignorant of their other priorities).
I commented something similar about a month ago. Writing up a funding proposal took longer than expected but we are going to send it out in the next few days. Unless something bad happens, the fiscal sponsor will be the University of Chicago which will enable us to do some pretty cool things!
If anyone has time to look at the proposal before we send it out or wants to be involved, they can send me a dm or email (zroe@uchicago.edu).
Someone should do the obvious experiments and replications.
Ryan Greenblatt recently posted three technical blog posts reporting on interesting experimental results. One of them demonstrated that recent LLMs can make use of filler tokens to improve their performance; another attempted to measure the time horizon of LLMs not using CoT; and the third demonstrated recent LLMs’ ability to do 2-hop and 3-hop reasoning.
I think all three of these experiments led to interesting results and improved our understanding of LLM capabilities in an important safety-relevant area (reasoning without visible traces), and I’m very happy Ryan did them.
I also think all three experiments look pretty obvious in hindsight. LLMs not being able to use filler tokens and having trouble with 2-hop reasoning were both famous results that already lived in my head as important pieces of information about what LLMs can do without visible reasoning traces. As far as I can tell, Ryan’s two posts simply try to replicate these two famous observations on more recent LLMs. The post on measuring no CoT time horizon is not a replication, but also doesn’t feel like a ground-breaking idea once the concept of increasing time horizons is already known.
My understanding is that the technical execution of these experiments wasn’t especially difficult either, in particular they didn’t require any specific machine learning expertise. (I might be wrong here, and I wonder how many hours Ryan spent on these experiments. I also wonder about the compute budget of these experiments, I don’t have a great estimate of that.)
I think it’s not good that these experiments were only run now, and that they needed to run by Ryan, one of the leading AI safety researchers. Possibly I’m underestimating the difficulty of coming up with these experiments and running them, but I think ideally these should have been done by a MATS scholar, or ideally by an eager beginner on a career transitioning grant who wants to demonstrate their abilities so they can get into MATS later.
Before accepting my current job, I was thinking about returning to Hungary and starting a small org with some old friends who have more coding experience, living on Eastern European salaries, and just churning out one simple experiment after another. One of the primary things I hoped to do with this org was to go through famous old results and try to replicate them. I hope we would have done the filler tokens and 2-hop reasoning replications too. I also had many half-baked ideas of running new simple experiments investigating ideas related to other famous results (in the way the no-CoT time horizon experiment is one possible interesting thing to investigate related to rising time horizons).
I eventually ended up doing something else, and I think my current job is probably a more important thing for me to do than trying to run the simple experiments org. But if someone is more excited about technical research than me, I think they should seriously consider doing this. I think funding could probably be found, and there are many new people who want to get into AI safety research; I think one could turn these resources into churning out a lot of replications and variations on old research, and produce interesting results. (And it could be a valuable learning experience for the AI safety beginners involved in doing the work.)
FWIW, Daniel Kokotajlo has commented in the past:
> If there was an org devoted to attempting to replicate important papers relevant to AI safety, I’d probably donate at least $100k to it this year, fwiw, and perhaps more on subsequent years depending on situation. Seems like an important institution to have. (This is not a promise ofc, I’d want to make sure the people knew what they were doing etc., but yeah)
A problem here is that, I believe, this is on the face of it not quite aligned with MATS scholars’ career incentives, as replicating existing research does not feel like projects that would really advance their prospects of getting hired. At least when I was involved in hiring, I would not have counted this as strong evidence or training for strong research skills (sorry for being part of the problem). On the other hand, it is totally plausible to incorporate replication of existing research as part of a larger research program investigating related issues (i.e. Ryan’s experiment about time horizon without COTs could fit well within a larger work investigating time horizons in general).
This may look different for the “eager beginners”, or something like the AI safety camp could be a good venue for pure replications.
Interesting. My guess would have been the opposite. Ryan’s three posts all received around 150 karmas and were generally well-received, I think a post like this would be considered 90th percentile success for a MATS project. But admittedly, I’m not very calibrated about current MATS projects. It’s also possible that Ryan has good enough intuitions to have picked two replications that are likely to yield interesting results, while a less skillfully chosen replication would be more likely to just show “yep, the phenomenon observed in the old paper is still true”. That would be less successful but I don’t know how it would compare in terms of prestige to the usual MATS projects. (My wild guess is that it would still be around median, but I really don’t know.)
(adding my takes in case they are useful for MATS fellows deciding what to do) I have seen many MATS projects via attending the MATS symposiums, but am relying on my memory of them. I would probably consider Ryan’s posts to each be like 60-70th percentile MATS project. But I expect that a strong MATS scholar could do 2-5 mini projects like this during the duration of MATS.
I think I disagree—doing research like this (especially several such projects) is really helpful for getting hired!
I think such an org should focus on automating simple safety research and paper replications with coding agents (e.g. Claude Code). My guess is that the models aren’t capable enough yet to autonomously do Ryan’s experiments but they may be in a generation or two, and working on this early seems valuable.
The Coefficient technical grant-making team should pitch some people on doing this and just Make It Happen (although I’m obviously ignorant of their other priorities).
I commented something similar about a month ago. Writing up a funding proposal took longer than expected but we are going to send it out in the next few days. Unless something bad happens, the fiscal sponsor will be the University of Chicago which will enable us to do some pretty cool things!
If anyone has time to look at the proposal before we send it out or wants to be involved, they can send me a dm or email (zroe@uchicago.edu).
Do you have suggestions for other particularly approachable but potentially high impact replications or quick research sprints?
Redwood has project proposals, but these seem higher effort than what you’re suggesting and more challenging for a beginner
https://blog.redwoodresearch.org/p/recent-redwood-research-project-proposals