Zach Stein-Perlman comments on xAI’s new safety framework is dreadful

Zach Stein-Perlman 4 Sep 2025 21:33 UTC
3 points
0
Maybe this phrasing is consistent with being AI because the model reads “assistant” as “AI assistant.” But I agree that the model would suspect it’s being tested.
I agree with you that MASK might not measure what it’s supposed to, but regardless, I think other problems are much larger, including that propensity to lie when pressured is near-totally unrelated to misalignment risk.