I’m in favour of open-source interpretability tools, but I’m not in favour of open-sourcing evaluation tools as I think they would be used for capabilities without substantially improving safety.
I’m in favour of open-source interpretability tools, but I’m not in favour of open-sourcing evaluation tools as I think they would be used for capabilities without substantially improving safety.