Yes.
Zvi
Karma: 54,240
AI #138 Part 1: The People Demand Erotic Sycophants
Monthly Roundup #35: October 2025
Trade Escalation, Supply Chain Vulnerabilities and Rare Earth Metals
OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53
2025 State of AI Report and Predictions
Knowing that, hopefully you wouldn’t?
Oh, of course, how silly of me!
AI #137: An OpenAI App For That
NEPA, Permitting and Energy Roundup #2
Bending The Curve
Medical Roundup #5
I was not aware of this at the time.
Sora and The Big Bright Screen Slop Machine
AI #136: A Song and Dance
Claude Sonnet 4.5 Is A Very Good Model
Claude Sonnet 4.5: System Card and Alignment
On Dwarkesh Patel’s Podcast With Richard Sutton
My guess is that on the margin more time should be spent improving the core messaging versus saturating the dialogue tree, on many AI questions, if you combine effort across everyone.
Would a reasonable way to summarize this be that if you train on pretend reward hacking you get emergent misalignment that takes the form of pretending (playacting) misbehaving and being evil, whereas if you here train on realistic reward hacking examples it starts realistically (and in some ways strategically) misbehaving and doing other forms of essentially reward hacking instead?