[Question] Boxing

Labs should do “boxing,” I hear. I’m confused about what kinds-of-systems this applies to and what labs should do to improve their “boxing.”

Kinds-of-systems: my impression is that there’s little direct threat from LMs simply undergoing gradient descent. But there’s some risk from models deployed internally, especially if they’re in scaffolding that lets them make recursive calls and use tools. But I suspect the previous sentence is confused or missing a lot.

Boxing tactics: I hear of tactics like automatically monitoring for unusual behavior, limiting data upload speed, and using honeypots. What should labs do; if you were in charge of boxing for a lab, what would you do? What should I read to learn more?

No answers.