Dan H

Karma: 3,549

ai-frontiers.org

newsletter.safe.ai

newsletter.mlsafety.org

Dan H 1 Dec 2025 17:18 UTC
LW: 16 AF: 3
5
AF
on: A Pragmatic Vision for Interpretability
Thank you to Neel for writing this. Most people pivot quietly.

I’ve been most skeptical of mechanistic interpretability for years. I excluded interpretability in Unsolved Problems in ML Safety for this reason. Other fields like d/acc (Systemic Safety) were included though, all the way back in 2021.

Here’s are some earlier criticisms: https://www.lesswrong.com/posts/5HtDzRAk7ePWsiL2L/open-problems-in-ai-x-risk-pais-5#Transparency

More recent commentary: https://ai-frontiers.org/articles/the-misguided-quest-for-mechanistic-ai-interpretability

I think the community should reflect on its genius worship culture (in the case of Olah, a close friend of the inner circle) and epistemics: the approach was so dominant for years, and I think this outcome was entirely foreseeable.

MLSN #17: Measuring General AI Abilities and Mitigating Deception

Alice Blair and Dan H

19 Nov 2025 20:11 UTC

5 points

0 comments6 min readLW link

(newsletter.mlsafety.org)

AISN #65: Measuring Automation and Superintelligence Moratorium Letter

Alice Blair and Dan H

29 Oct 2025 16:05 UTC

5 points

0 comments3 min readLW link

(newsletter.safe.ai)

Dan H 28 Oct 2025 5:09 UTC
13 points
4
on: AIs should also refuse to work on capabilities research
This dynamic is captured in IABIED’s story and this paper from 2023: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4445706

AISN#64: New AGI Definition and Senate Bill Would Establish Liability for AI Harms

Corin Katzke and Dan H

16 Oct 2025 18:06 UTC

5 points

1 comment5 min readLW link

(aisafety.substack.com)

AISN #63: California’s SB-53 Passes the Legislature

Corin Katzke and Dan H

24 Sep 2025 17:02 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

Dan H 17 Sep 2025 3:46 UTC
4 points
0
in reply to: Thane Ruthenis’s comment on: Thane Ruthenis’s Shortform

just rehearsed variations on the arguments Eliezer/MIRI already deployed

I think they’re improved and simplified.

My favorite chapter is “Chapter 5: Its Favorite Things.”

AISN #61: OpenAI Releases GPT-5

Corin Katzke and Dan H

12 Aug 2025 18:02 UTC

5 points

0 comments4 min readLW link

(newsletter.safe.ai)

AISN #60: The AI Action Plan

Corin Katzke and Dan H

31 Jul 2025 18:20 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

AISN #59: EU Publishes General-Purpose AI Code of Practice

Corin Katzke and Dan H

15 Jul 2025 18:59 UTC

10 points

0 comments4 min readLW link

(aisafety.substack.com)

AISN #58: Senate Removes State AI Regulation Moratorium

Corin Katzke and Dan H

3 Jul 2025 17:26 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

AISN #57: The RAISE Act

Corin Katzke and Dan H

17 Jun 2025 18:02 UTC

6 points

0 comments3 min readLW link

(newsletter.safe.ai)

AISN #56: Google Releases Veo 3

Corin Katzke and Dan H

28 May 2025 16:00 UTC

7 points

0 comments4 min readLW link

(newsletter.safe.ai)

AISN #55: Trump Administration Rescinds AI Diffusion Rule, Allows Chip Sales to Gulf States

Corin Katzke and Dan H

20 May 2025 16:21 UTC

6 points

1 comment4 min readLW link

(forum.effectivealtruism.org)

Dan H 15 May 2025 21:36 UTC
28 points
4
on: Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
It’s a great book: it’s simple, memorable, and unusually convincing.

AISN #54: OpenAI Updates Restructure Plan

Corin Katzke and Dan H

13 May 2025 16:59 UTC

8 points

1 comment4 min readLW link

(newsletter.safe.ai)

AISN #53: An Open Letter Attempts to Block OpenAI Restructuring

Corin Katzke and Dan H

29 Apr 2025 16:13 UTC

7 points

0 comments4 min readLW link

AISN#52: An Expert Virology Benchmark

Corin Katzke and Dan H

22 Apr 2025 17:08 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

AISN #51: AI Frontiers

Corin Katzke and Dan H

15 Apr 2025 16:01 UTC

8 points

1 comment5 min readLW link

(newsletter.safe.ai)

Dan H 4 Apr 2025 20:18 UTC
3 points
4
in reply to: Tao Lin’s comment on: Good Research Takes are Not Sufficient for Good Strategic Takes
If a strategy is likely to be outdated quickly it’s not robust and not a good strategy. Strategies should be able to withstand lots of variation.