AXRP

TagLast edit: 28 Jan 2025 2:39 UTC by Ruby

AI X-Risk Research Podcast is a podcast hosted by Daniel Filan.

AXRP Episode 26 - AI Governance with Elizabeth Seger

DanielFilan26 Nov 2023 23:00 UTC

14 points

0 comments66 min readLW link

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

DanielFilan28 Mar 2025 18:40 UTC

26 points

0 comments89 min readLW link

AXRP Episode 28 - Suing Labs for AI Risk with Gabriel Weil

DanielFilan17 Apr 2024 21:42 UTC

12 points

0 comments65 min readLW link

AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt

DanielFilan11 Apr 2024 21:30 UTC

69 points

10 comments107 min readLW link

AXRP Episode 33 - RLHF Problems with Scott Emmons

DanielFilan12 Jun 2024 3:30 UTC

34 points

0 comments56 min readLW link

AXRP Episode 42 - Owain Evans on LLM Psychology

DanielFilan6 Jun 2025 20:20 UTC

13 points

0 comments66 min readLW link

AXRP Episode 48 - Guive Assadi on AI Property Rights

DanielFilan15 Feb 2026 2:20 UTC

22 points

0 comments75 min readLW link

AXRP Episode 38.4 - Shakeel Hashim on AI Journalism

DanielFilan5 Jan 2025 0:20 UTC

11 points

0 comments12 min readLW link

AXRP announcement: Survey, Store Closing, Patreon

DanielFilan28 Jun 2023 23:40 UTC

14 points

0 comments1 min readLW link

AXRP Episode 29 - Science of Deep Learning with Vikrant Varma

DanielFilan25 Apr 2024 19:10 UTC

20 points

1 comment63 min readLW link

AXRP: Store, Patreon, Video

DanielFilan7 Feb 2023 4:50 UTC

12 points

0 comments1 min readLW link

AXRP Episode 43 - David Lindner on Myopic Optimization with Non-myopic Approval

DanielFilan15 Jun 2025 1:20 UTC

12 points

0 comments56 min readLW link

AXRP Episode 49 - Caspar Oesterheld on Program Equilibrium

DanielFilan18 Feb 2026 1:30 UTC

10 points

1 comment72 min readLW link

AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan21 Aug 2022 23:50 UTC

16 points

0 comments35 min readLW link

AXRP Episode 38.6 - Joel Lehman on Positive Visions of AI

DanielFilan24 Jan 2025 23:00 UTC

10 points

0 comments9 min readLW link

AXRP Episode 22 - Shard Theory with Quintin Pope

DanielFilan15 Jun 2023 19:00 UTC

52 points

11 comments93 min readLW link

AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu

DanielFilan27 Jul 2023 1:50 UTC

22 points

0 comments72 min readLW link

AXRP Episode 41 - Lee Sharkey on Attribution-based Parameter Decomposition

DanielFilan3 Jun 2025 3:40 UTC

28 points

1 comment61 min readLW link

AXRP Episode 38.7 - Anthony Aguirre on the Future of Life Institute

DanielFilan9 Feb 2025 1:10 UTC

10 points

0 comments12 min readLW link

AXRP Episode 34 - AI Evaluations with Beth Barnes

DanielFilan28 Jul 2024 3:30 UTC

23 points

0 comments69 min readLW link

AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo

DanielFilan31 Mar 2022 5:20 UTC

28 points

1 comment48 min readLW link

AXRP Episode 16 - Preparing for Debate AI with Geoffrey Irving

DanielFilan1 Jul 2022 22:20 UTC

20 points

0 comments37 min readLW link

AXRP Episode 38.2 - Jesse Hoogland on Singular Learning Theory

DanielFilan27 Nov 2024 6:30 UTC

34 points

0 comments10 min readLW link

AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics

DanielFilan29 Sep 2024 5:50 UTC

26 points

0 comments55 min readLW link

AXRP Episode 44 - Peter Salib on AI Rights for Human Safety

DanielFilan28 Jun 2025 1:40 UTC

12 points

0 comments103 min readLW link

AXRP Episode 25 - Cooperative AI with Caspar Oesterheld

DanielFilan3 Oct 2023 21:50 UTC

43 points

0 comments92 min readLW link

AXRP Episode 30 - AI Security with Jeffrey Ladish

DanielFilan1 May 2024 2:50 UTC

25 points

0 comments79 min readLW link

AXRP Episode 47 - David Rein on METR Time Horizons

DanielFilan3 Jan 2026 0:10 UTC

21 points

0 comments46 min readLW link

Video/animation: Neel Nanda explains what mechanistic interpretability is

DanielFilan22 Feb 2023 22:42 UTC

24 points

7 comments1 min readLW link

(youtu.be)

AXRP Episode 21 - Interpretability for Engineers with Stephen Casper

DanielFilan2 May 2023 0:50 UTC

12 points

1 comment66 min readLW link

AXRP Episode 37 - Jaime Sevilla on Forecasting AI

DanielFilan4 Oct 2024 21:00 UTC

21 points

3 comments56 min readLW link

AXRP Episode 38.1 - Alan Chan on Agent Infrastructure

DanielFilan16 Nov 2024 23:30 UTC

12 points

0 comments14 min readLW link

AXRP Episode 39 - Evan Hubinger on Model Organisms of Misalignment

DanielFilan1 Dec 2024 6:00 UTC

41 points

0 comments67 min readLW link

AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilan23 May 2022 5:40 UTC

34 points

1 comment58 min readLW link

AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong

DanielFilan3 Sep 2022 23:12 UTC

12 points

1 comment39 min readLW link

AXRP Episode 38.3 - Erik Jenner on Learned Look-Ahead

DanielFilan12 Dec 2024 5:40 UTC

20 points

0 comments16 min readLW link

AXRP Episode 46 - Tom Davidson on AI-enabled Coups

DanielFilan7 Aug 2025 5:10 UTC

11 points

0 comments68 min readLW link

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet

DanielFilan7 May 2024 3:50 UTC

72 points

4 comments71 min readLW link

AXRP Episode 38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future

DanielFilan1 Mar 2025 1:20 UTC

13 points

0 comments13 min readLW link

AXRP Episode 19 - Mechanistic Interpretability with Neel Nanda

DanielFilan4 Feb 2023 3:00 UTC

45 points

0 comments117 min readLW link

AXRP Episode 45 - Samuel Albanie on DeepMind’s AGI Safety Approach

DanielFilan6 Jul 2025 23:00 UTC

31 points

0 comments40 min readLW link

AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems

DanielFilan14 Nov 2024 7:00 UTC

14 points

0 comments12 min readLW link

AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy

DanielFilan5 Apr 2022 23:10 UTC

25 points

10 comments52 min readLW link

AXRP Episode 38.5 - Adrià Garriga-Alonso on Detecting AI Scheming

DanielFilan20 Jan 2025 0:40 UTC

9 points

0 comments16 min readLW link

AXRP Episode 32 - Understanding Agency with Jan Kulveit

DanielFilan30 May 2024 3:50 UTC

20 points

0 comments53 min readLW link

AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

DanielFilan24 Aug 2024 22:30 UTC

21 points

0 comments74 min readLW link

AXRP Episode 20 - ‘Reform’ AI Alignment with Scott Aaronson

DanielFilan12 Apr 2023 21:30 UTC

22 points

2 comments68 min readLW link

AXRP Episode 24 - Superalignment with Jan Leike

DanielFilan27 Jul 2023 4:00 UTC

55 points

3 comments69 min readLW link

AXRP Episode 11 - Attainable Utility and Power with Alex Turner

DanielFilan25 Sep 2021 21:10 UTC

19 points

5 comments53 min readLW link

AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy

DanielFilan10 Mar 2021 4:30 UTC

35 points

12 comments36 min readLW link

AXRP Episode 7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra

DanielFilan28 May 2021 0:20 UTC

24 points

1 comment67 min readLW link

“Infra-Bayesianism with Vanessa Kosoy” – Watch/Discuss Party

Ben Pace22 Mar 2021 23:44 UTC

27 points

45 comments1 min readLW link

AXRP Episode 6 - Debate and Imitative Generalization with Beth Barnes

DanielFilan8 Apr 2021 21:20 UTC

26 points

3 comments60 min readLW link

Announcing AXRP, the AI X-risk Research Podcast

DanielFilan23 Dec 2020 20:00 UTC

54 points

5 comments1 min readLW link

(danielfilan.com)

AXRP Episode 3 - Negotiable Reinforcement Learning with Andrew Critch

DanielFilan29 Dec 2020 20:45 UTC

27 points

0 comments28 min readLW link

AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger

DanielFilan18 Feb 2021 0:03 UTC

43 points

10 comments87 min readLW link

AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant

DanielFilan24 Jun 2021 22:10 UTC

59 points

2 comments59 min readLW link

AXRP Episode 12 - AI Existential Risk with Paul Christiano

DanielFilan2 Dec 2021 2:20 UTC

38 points

0 comments126 min readLW link

AXRP Episode 10 - AI’s Future and Impacts with Katja Grace

DanielFilan23 Jul 2021 22:10 UTC

34 points

2 comments77 min readLW link

AXRP Episode 1 - Adversarial Policies with Adam Gleave

DanielFilan29 Dec 2020 20:41 UTC

12 points

5 comments34 min readLW link

AXRP Episode 2 - Learning Human Biases with Rohin Shah

DanielFilan29 Dec 2020 20:43 UTC

13 points

0 comments35 min readLW link

AXRP Episode 7 - Side Effects with Victoria Krakovna

DanielFilan14 May 2021 3:50 UTC

34 points

6 comments43 min readLW link

AXRP Episode 8 - Assistance Games with Dylan Hadfield-Menell

DanielFilan8 Jun 2021 23:20 UTC

22 points

1 comment72 min readLW link

No comments.