Fabien Roger comments on Fabien’s Shortform

Fabien Roger 13 May 2025 4:25 UTC
9 points
0
Slightly late update: I am in the middle of projects which would be hard to do outside Anthropic. I will push them for at least 2-4 more months before reconsidering my decision and writing up my views.
- Fabien Roger 19 Aug 2025 9:47 UTC
  2 points
  0
  Parent
  Quick retro on the decision I made to move from Redwood to Anthropic a year ago:
  - Working on a project with interesting core ideas (and that is tractable to make progress on) is the main predictor of research success. My relatively low output over the past year can in big part be explained by not finding projects that had sufficiently good core ideas.
  - To work on the right project, collaborating with Ryan Greenblatt and Buck Shlegeris is an amazing opportunity. I intend to continue working at Anthropic for now, but I intend to collaborate with them more closely than I did in the past.
  - The single factor I underestimated the most is how helpful access to data is, and in particular access to coding agent usage data from hundreds of employees and training/eval data from production training runs. (For the sort of research I am most excited about, I agree with Ryan’s take that private access to frontier models is overrated. Though some of that is contingent on open source models being not that far from the frontier—which was not obvious a year ago. If the only good reasoning models were closed source and didn’t show reasoning traces, private access to frontier models would have been more important.)
  - The single factor I overestimated the most is how much working at an AI company would make experiments which are technically possible outside of an AI company easier (relative to working in a well-funded AI safety org like Redwood) - though I don’t know how general this is (it did not help my projects much, but other people have different opinions here).
  - I think Redwood people are surprisingly in touch with reality, and in particular I still think that Redwood-style futurism is better than forecasts based on the present situation at AI companies. I think first and second hand contact with reality from working at an AI company is still helpful, but in different ways:
    It helps you understand all the random stuff that makes things hard to do for real (to give a simple example, I never heard about zero-data-retention before joining Anthropic).
    It helps you understand why people disagree with you on more political stuff (I think I am much closer to passing an ideological Turing test for AI company leadership—which I think is pretty helpful for things like designing plausible red lines that trigger certain mitigations).
    You can get this information by talking on a very regular basis with people that work at AI companies, but confidentiality makes it much higher friction and therefore it doesn’t happen as much.