anaguma comments on Measuring and improving coding audit realism with deployment resources

anaguma 24 Mar 2026 21:13 UTC
1 point
0
This is interesting work and it shows some promise. I find it surprising that even with this intervention the model still believes the eval isn’t real 76% of the time, and I imagine this number will increase as capabilities improve.