Joseph Miller answers Where are the AI safety replications?

Joseph Miller 27 Jul 2025 9:29 UTC
49 points
12
My paper replicated and picked holes in previous mech interp papers: https://arxiv.org/abs/2407.08734
Note that the concept of replication is a bit different in ML. If an author open-sources their code (which is common) is will usually be straightforward to achieve perfect replication of their experiments.
However what is more unusual (and what my paper did) is to exactly reimplement someone else’s code. This is very valuable because you will be much more likely to notice bugs and poor design choices.
I think your question is good. My experience suggests that many AI safety papers have important bugs and poor design decisions, which mostly are never noticed. However, perfectly understanding and reimplementing someone else’s experiments is very time consuming and often not very much rewarded. I mostly did it because I was creating a library, so I got additional value beyond just the paper.