Michael Ripa answers Where are the AI safety replications?

Michael Ripa 27 Jul 2025 16:30 UTC
5 points
0
For Interpretability research, something being worked on right now are a set of tutorials which replicates results from recent papers in NNsight: https://nnsight.net/applied_tutorials/
What I find cool about this particular effort is that because the implementations are done with NNsight, it both makes it easier to adapt experiments to new models, and you can run the experiments remotely.
(Disclaimer—I work on the NDIF/NNsight project, though not on this initiative, so take my enthusiasm with a grain of salt)