Jacob Pfau answers How far along Metr’s law can AI start automating or helping with alignment research?

Jacob Pfau 21 Mar 2025 14:29 UTC
3 points
2
To apply METR’s law we should distinguish conceptual alignment work from well-defined alignment work (including empirics and theory on existing conjectures). The METR plot doesn’t tell us anything quantitative about the former.

As for the latter, let’s take interpretability as an example: We can model uncertainty as a distribution over the time-horizon needed for interpretability research e.g. ranging over 40-1000 hours. Then, I get 66% CI of 2027-2030 for open-ended interp research automation—colab here. I’ve written up more details on this in a post here.