Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Neel Nanda comments on
Interpretability Will Not Reliably Find Deceptive AI
Neel Nanda
5 May 2025 12:11 UTC
2
points
0
I’m not trying to comment on other theories of change in this post, so no disagreement there
Back to top
I’m not trying to comment on other theories of change in this post, so no disagreement there