Zac Hatfield-Dodds

Karma: 3,733

Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev

Claude’s new constitution

Zac Hatfield-Dodds and Drake Thomas

21 Jan 2026 19:37 UTC

176 points

47 comments6 min readLW link

(www.anthropic.com)

Anthropic’s updated Responsible Scaling Policy

Zac Hatfield-Dodds15 Oct 2024 16:46 UTC

38 points

3 comments3 min readLW link

(www.anthropic.com)

Anthropic: Reflections on our Responsible Scaling Policy

Zac Hatfield-Dodds20 May 2024 4:14 UTC

25 points

21 comments10 min readLW link

(www.anthropic.com)

Simple probes can catch sleeper agents

Monte M, Carson Denison, Zac Hatfield-Dodds, David Duvenaud, Sam Bowman, Ethan Perez and evhub

23 Apr 2024 21:10 UTC

131 points

21 comments1 min readLW link

(www.anthropic.com)

Third-party testing as a key ingredient of AI policy

Zac Hatfield-Dodds25 Mar 2024 22:40 UTC

11 points

1 comment12 min readLW link

(www.anthropic.com)

Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy

Zac Hatfield-Dodds1 Nov 2023 18:10 UTC

80 points

1 comment4 min readLW link

(www.anthropic.com)

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC

289 points

22 comments2 min readLW link 1 review

(transformer-circuits.pub)

Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust

Zac Hatfield-Dodds19 Sep 2023 15:09 UTC

85 points

26 comments3 min readLW link 1 review

(www.anthropic.com)

Anthropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC

173 points

40 comments2 min readLW link

(www.anthropic.com)

Concrete Reasons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC

94 points

13 comments1 min readLW link

In Defence of Spock

Zac Hatfield-Dodds21 Apr 2021 21:34 UTC

43 points

5 comments1 min readLW link

Zac Hatfield Dodds’s Shortform

Zac Hatfield-Dodds9 Mar 2021 2:39 UTC

2 points

13 comments1 min readLW link