Note: these are good points, and things that occurred to us also across the course of our 3 month project. We didn’t have time to address these things, but future work definitely should.
Key ToDo items:
Better, more realistic side tasks. (This will get easier once the models get better. Getting the red team models to accomplish both the main task and a realistic side task in a stealthy way was quite challenging.)
Strategic side task selection by the Red Team (only defecting when there’s actually a strategically good opportunity to do so. Too easy for the Blue Team if you force defections in dumb scenarios).
Fine-tuning the monitor (we almost had time, but not quite!)
Note: these are good points, and things that occurred to us also across the course of our 3 month project. We didn’t have time to address these things, but future work definitely should.
Key ToDo items:
Better, more realistic side tasks. (This will get easier once the models get better. Getting the red team models to accomplish both the main task and a realistic side task in a stealthy way was quite challenging.)
Strategic side task selection by the Red Team (only defecting when there’s actually a strategically good opportunity to do so. Too easy for the Blue Team if you force defections in dumb scenarios).
Fine-tuning the monitor (we almost had time, but not quite!)