dwk

Karma: 60

What Drives the Compliance Gap? A Three-Driver Decomposition of Alignment Faking

Nathaniel Mitrani, Rhea Karty, dwk and Alan Cooney

28 May 2026 10:50 UTC

27 points

0 comments8 min readLW link

(arxiv.org)

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Yoshua Bengio, Jesse Richardson, dwk and mattmacdermott

24 Feb 2025 18:31 UTC

45 points

15 comments11 min readLW link