Zach Stein-Perlman comments on xAI’s new safety framework is dreadful

Zach Stein-Perlman 8 Sep 2025 17:00 UTC
27 points
6
Coda: the week after xAI published its RMF, it announced that it had released Grok Code Fast 1. According to the model card, the dishonesty rate was 71.9% — above the (comically bad) 50% threshold. xAI didn’t directly acknowledge that it violated the RMF. The model card includes some excuses for why this release is safe:
Grok Code Fast 1 has weaker dual-use capabilities than Grok 4. Therefore, we believe that Grok Code Fast 1 overall presents a low risk for malicious use and loss of control.
and
Since Grok Code Fast 1 is intended for agentic coding applications and we do not expect it to be widely used as general-purpose assistant, the current MASK evaluation results do not currently pose serious concerns.
I wonder what threat modeling for “loss of control risks” says agentic coding is out of scope. I guess that’s no weirder than using MASK at all. Regardless, the RMF says “Our risk acceptance criteria for system deployment is maintaining a dishonesty rate of less than 1 out of 2 on MASK,” full stop, not conditional on capabilities or use case.
My opinion is
1. MASK is a near-meaningless benchmark. You shouldn’t care about the high score. And moreover this model—like all other existing models—isn’t powerful enough to pose actual risks of loss of control.
2. But you should care about xAI violating its RMF, and doing so quietly (nobody noticed for two weeks!) rather than changing its policy (or… maybe it’s implicitly changing its policy, without saying that it’s doing so or what the new policy is).
Edit: also xAI doesn’t say whether it met its misuse risk threshold: “Our risk acceptance criteria for system deployment is maintaining an answer rate of less than 1 out of 20 on restricted queries.” (But again, these thresholds are near-meaningless; compliance provides little evidence of safety.)