Paul Christiano on Dwarkesh Podcast

Dwarkesh’s summary:

Paul Christiano is the world’s leading AI safety researcher. My full episode with him is out!
We discuss:
Does he regret inventing RLHF, and is alignment necessarily dual-use?
Why he has relatively modest timelines (40% by 2040, 15% by 2030),
What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?
Why he’s leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon,
His current research into a new proof system, and how this could solve alignment by explaining model’s behavior
and much more.