Priyanka Bharadwaj comments on Why Eliminating Deception Won’t Align AI

Priyanka Bharadwaj 15 Jul 2025 9:21 UTC
1 point
0
Thanks for reading! I’m especially interested in feedback from folks working on mechanistic interpretability or deception threat models. Does this framing feel complementary, orthogonal, or maybe just irrelevant to your current assumptions? Happy to be redirected if there are blind spots I’m missing.