RSS

Zac Hatfield-Dodds(Zac Hatfield-Dodds)

Karma: 2,176

Technical staff at Anthropic, previously #3ainstitute; interdisciplinary, interested in everything; ongoing PhD in CS (learning /​ testing /​ verification), open sourcerer, more at zhd.dev

Sim­ple probes can catch sleeper agents

23 Apr 2024 21:10 UTC
117 points
15 comments1 min readLW link
(www.anthropic.com)

Third-party test­ing as a key in­gre­di­ent of AI policy

Zac Hatfield-Dodds25 Mar 2024 22:40 UTC
11 points
1 comment12 min readLW link
(www.anthropic.com)