TheManxLoiner comments on White Box Control at UK AISI—Update on Sandbagging Investigations

TheManxLoiner 25 Jul 2025 14:52 UTC
1 point
0
Has the team tried noise injection as a way of detecting if a model is sandbagging, in these different scenarios?