newsletter.safe.ai
Dan H
Karma: 3,574
MLSN #18: Adversarial Diffusion, Activation Oracles, Weird Generalization
MLSN #17: Measuring General AI Abilities and Mitigating Deception
AISN #65: Measuring Automation and Superintelligence Moratorium Letter
This dynamic is captured in IABIED’s story and this paper from 2023: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4445706
AISN#64: New AGI Definition and Senate Bill Would Establish Liability for AI Harms
AISN #63: California’s SB-53 Passes the Legislature
just rehearsed variations on the arguments Eliezer/MIRI already deployed
I think they’re improved and simplified.
My favorite chapter is “Chapter 5: Its Favorite Things.”
AISN #61: OpenAI Releases GPT-5
AISN #60: The AI Action Plan
AISN #59: EU Publishes General-Purpose AI Code of Practice
AISN #58: Senate Removes State AI Regulation Moratorium
AISN #57: The RAISE Act
AISN #56: Google Releases Veo 3
AISN #55: Trump Administration Rescinds AI Diffusion Rule, Allows Chip Sales to Gulf States
It’s a great book: it’s simple, memorable, and unusually convincing.
Thank you to Neel for writing this. Most people pivot quietly.
I’ve been most skeptical of mechanistic interpretability for years. I excluded interpretability in Unsolved Problems in ML Safety for this reason. Other fields like d/acc (Systemic Safety) were included though, all the way back in 2021.
Here’s are some earlier criticisms: https://www.lesswrong.com/posts/5HtDzRAk7ePWsiL2L/open-problems-in-ai-x-risk-pais-5#Transparency
More recent commentary: https://ai-frontiers.org/articles/the-misguided-quest-for-mechanistic-ai-interpretability
I think the community should reflect on its genius worship culture (in the case of Olah, a close friend of the inner circle) and epistemics: the approach was so dominant for years, and I think this outcome was entirely foreseeable.