I think the extent to which it’s possible to publish without giving away commercially sensitive information depends a lot on exactly what kind of “safety work” it is. For example, if you figured out a way to stop models from reward hacking on unit tests, it’s probably to your advantage to not share that with competitors.
I think the extent to which it’s possible to publish without giving away commercially sensitive information depends a lot on exactly what kind of “safety work” it is. For example, if you figured out a way to stop models from reward hacking on unit tests, it’s probably to your advantage to not share that with competitors.