In a few months, I will be leaving Redwood Research (where I am working as a researcher) and I will be joining one of Anthropic’s safety teams.
I think that, over the past year, Redwood has done some of the best AGI safety research and I expect it will continue doing so when I am gone.
At Anthropic, I will help Ethan Perez’s team pursue research directions that in part stemmed from research done at Redwood. I have already talked with Ethan on many occasions, and I’m excited about the safety research I’m going to be doing there. Note that I don’t endorse everything Anthropic does; the main reason I am joining is I might do better and/or higher impact research there.
I did almost all my research at Redwood and under the guidance of the brilliant people working there, so I don’t know yet how happy I will be about my impact working in another research environment, with other research styles, perspectives, and opportunities—that’s something I will learn while working there. I will reconsider whether to stay at Anthropic, return to Redwood, or go elsewhere in February/March next year (Manifold market here), and I will release an internal and an external write-up of my views.
Alas, seems like a mistake. My advice is at least to somehow divest away from the Anthropic equity, which I expect will have a large effect on your cognition one way or another.
I vaguely second this. My (intuitive, sketchy) sense is that Fabien has the ready capacity to be high integrity. (And I don’t necessarily mind kinda mixing expectation with exhortation about that.) A further exhortation for Fabien: insofar as it feels appropriate, keep your eyes open, looking at both yourself and others, for “large effects on your cognition one way or another”—”black box” (https://en.wikipedia.org/wiki/Flight_recorder) info about such contexts is helpful for the world!
Slightly late update: I am in the middle of projects which would be hard to do outside Anthropic. I will push them for at least 2-4 more months before reconsidering my decision and writing up my views.
Quick retro on the decision I made to move from Redwood to Anthropic a year ago:
Working on a project with interesting core ideas (and that is tractable to make progress on) is the main predictor of research success. My relatively low output over the past year can in big part be explained by not finding projects that had sufficiently good core ideas.
To work on the right project, collaborating with Ryan Greenblatt and Buck Shlegeris is an amazing opportunity. I intend to continue working at Anthropic for now, but I intend to collaborate with them more closely than I did in the past.
The single factor I underestimated the most is how helpful access to data is, and in particular access to coding agent usage data from hundreds of employees and training/eval data from production training runs. (For the sort of research I am most excited about, I agree with Ryan’s take that private access to frontier models is overrated. Though some of that is contingent on open source models being not that far from the frontier—which was not obvious a year ago. If the only good reasoning models were closed source and didn’t show reasoning traces, private access to frontier models would have been more important.)
The single factor I overestimated the most is how much working at an AI company would make experiments which are technically possible outside of an AI company easier (relative to working in a well-funded AI safety org like Redwood) - though I don’t know how general this is (it did not help my projects much, but other people have different opinions here).
I think Redwood people are surprisingly in touch with reality, and in particular I still think that Redwood-style futurism is better than forecasts based on the present situation at AI companies. I think first and second hand contact with reality from working at an AI company is still helpful, but in different ways:
It helps you understand all the random stuff that makes things hard to do for real (to give a simple example, I never heard about zero-data-retention before joining Anthropic).
It helps you understand why people disagree with you on more political stuff (I think I am much closer to passing an ideological Turing test for AI company leadership—which I think is pretty helpful for things like designing plausible red lines that trigger certain mitigations).
You can get this information by talking on a very regular basis with people that work at AI companies, but confidentiality makes it much higher friction and therefore it doesn’t happen as much.
Words are not endorsement, contributing actions are. I suspect what you’re doing could be on net very positive; Please don’t assume your coworkers are sanely trying to make ai have a good outcome unless you can personally push them towards it. If things are healthy, they will already be expecting this attitude and welcome it greatly. Please assume aligning claude to anthropic is insufficient, anthropic must also be aligned, and as a corporation, is by default not going to be. Be kind, but don’t trust people to resist incentives unless you can do it and pull them towards doing so.
Congrats on the new role! I appreciate you sharing this here.
If you’re able to share more, I’d be curious to learn more about your uncertainties about the transition. Based on your current understanding, what are the main benefits you’re hoping to get at Anthropic? In February/March, what are the key areas you’ll be reflecting on when you decide whether to stay at Anthropic or come back to Redwood?
Obviously, your February/March write-up will not necessarily conform to these “pre-registered” considerations. But nonetheless, I think pre-registering some considerations or uncertainties in advance could be a useful exercise (and I would certainly find it interesting!)
The main consideration is whether I will have better and/or higher impact safety research there (at Anthropic I will have a different research environment, with other research styles, perspectives, and opportunities, which I may find better). I will also consider indirect impact (e.g. I might be indirectly helping Anthropic instead of another organization gain influence, unclear sign) and personal (non-financial) stuff. I’m not very comfortable sharing more at the moment, but I have a big Google doc that I have shared with some people I trust.
Makes sense— I think the thing I’m trying to point at is “what do you think better safety research actually looks like?”
I suspect there’s some risk that, absent some sort of pre-registrarion, your definition of “good safety research” ends up gradually drifting to be more compatible with the kind of research Anthropic does.
Of course, not all of this will be a bad thing— hopefully you will genuinely learn some new things that change your opinion of what “good research” is.
But the nice thing about pre-registration is that you can be more confident that belief changes are stemming from a deliberate or at least self-aware process, as opposed to some sort of “maybe I thought this all along//i didn’t really know what i believed before I joined” vibe. (and perhaps this is sufficiently covered in your doc)
In a few months, I will be leaving Redwood Research (where I am working as a researcher) and I will be joining one of Anthropic’s safety teams.
I think that, over the past year, Redwood has done some of the best AGI safety research and I expect it will continue doing so when I am gone.
At Anthropic, I will help Ethan Perez’s team pursue research directions that in part stemmed from research done at Redwood. I have already talked with Ethan on many occasions, and I’m excited about the safety research I’m going to be doing there. Note that I don’t endorse everything Anthropic does; the main reason I am joining is I might do better and/or higher impact research there.
I did almost all my research at Redwood and under the guidance of the brilliant people working there, so I don’t know yet how happy I will be about my impact working in another research environment, with other research styles, perspectives, and opportunities—that’s something I will learn while working there. I will reconsider whether to stay at Anthropic, return to Redwood, or go elsewhere in February/March next year (Manifold market here), and I will release an internal and an external write-up of my views.
Alas, seems like a mistake. My advice is at least to somehow divest away from the Anthropic equity, which I expect will have a large effect on your cognition one way or another.
I vaguely second this. My (intuitive, sketchy) sense is that Fabien has the ready capacity to be high integrity. (And I don’t necessarily mind kinda mixing expectation with exhortation about that.) A further exhortation for Fabien: insofar as it feels appropriate, keep your eyes open, looking at both yourself and others, for “large effects on your cognition one way or another”—”black box” (https://en.wikipedia.org/wiki/Flight_recorder) info about such contexts is helpful for the world!
Slightly late update: I am in the middle of projects which would be hard to do outside Anthropic. I will push them for at least 2-4 more months before reconsidering my decision and writing up my views.
Quick retro on the decision I made to move from Redwood to Anthropic a year ago:
Working on a project with interesting core ideas (and that is tractable to make progress on) is the main predictor of research success. My relatively low output over the past year can in big part be explained by not finding projects that had sufficiently good core ideas.
To work on the right project, collaborating with Ryan Greenblatt and Buck Shlegeris is an amazing opportunity. I intend to continue working at Anthropic for now, but I intend to collaborate with them more closely than I did in the past.
The single factor I underestimated the most is how helpful access to data is, and in particular access to coding agent usage data from hundreds of employees and training/eval data from production training runs. (For the sort of research I am most excited about, I agree with Ryan’s take that private access to frontier models is overrated. Though some of that is contingent on open source models being not that far from the frontier—which was not obvious a year ago. If the only good reasoning models were closed source and didn’t show reasoning traces, private access to frontier models would have been more important.)
The single factor I overestimated the most is how much working at an AI company would make experiments which are technically possible outside of an AI company easier (relative to working in a well-funded AI safety org like Redwood) - though I don’t know how general this is (it did not help my projects much, but other people have different opinions here).
I think Redwood people are surprisingly in touch with reality, and in particular I still think that Redwood-style futurism is better than forecasts based on the present situation at AI companies. I think first and second hand contact with reality from working at an AI company is still helpful, but in different ways:
It helps you understand all the random stuff that makes things hard to do for real (to give a simple example, I never heard about zero-data-retention before joining Anthropic).
It helps you understand why people disagree with you on more political stuff (I think I am much closer to passing an ideological Turing test for AI company leadership—which I think is pretty helpful for things like designing plausible red lines that trigger certain mitigations).
You can get this information by talking on a very regular basis with people that work at AI companies, but confidentiality makes it much higher friction and therefore it doesn’t happen as much.
Words are not endorsement, contributing actions are. I suspect what you’re doing could be on net very positive; Please don’t assume your coworkers are sanely trying to make ai have a good outcome unless you can personally push them towards it. If things are healthy, they will already be expecting this attitude and welcome it greatly. Please assume aligning claude to anthropic is insufficient, anthropic must also be aligned, and as a corporation, is by default not going to be. Be kind, but don’t trust people to resist incentives unless you can do it and pull them towards doing so.
Congrats on the new role! I appreciate you sharing this here.
If you’re able to share more, I’d be curious to learn more about your uncertainties about the transition. Based on your current understanding, what are the main benefits you’re hoping to get at Anthropic? In February/March, what are the key areas you’ll be reflecting on when you decide whether to stay at Anthropic or come back to Redwood?
Obviously, your February/March write-up will not necessarily conform to these “pre-registered” considerations. But nonetheless, I think pre-registering some considerations or uncertainties in advance could be a useful exercise (and I would certainly find it interesting!)
The main consideration is whether I will have better and/or higher impact safety research there (at Anthropic I will have a different research environment, with other research styles, perspectives, and opportunities, which I may find better). I will also consider indirect impact (e.g. I might be indirectly helping Anthropic instead of another organization gain influence, unclear sign) and personal (non-financial) stuff. I’m not very comfortable sharing more at the moment, but I have a big Google doc that I have shared with some people I trust.
Makes sense— I think the thing I’m trying to point at is “what do you think better safety research actually looks like?”
I suspect there’s some risk that, absent some sort of pre-registrarion, your definition of “good safety research” ends up gradually drifting to be more compatible with the kind of research Anthropic does.
Of course, not all of this will be a bad thing— hopefully you will genuinely learn some new things that change your opinion of what “good research” is.
But the nice thing about pre-registration is that you can be more confident that belief changes are stemming from a deliberate or at least self-aware process, as opposed to some sort of “maybe I thought this all along//i didn’t really know what i believed before I joined” vibe. (and perhaps this is sufficiently covered in your doc)