RSS

dwk

Karma: 55

What Drives the Com­pli­ance Gap? A Three-Driver De­com­po­si­tion of Align­ment Faking

28 May 2026 10:50 UTC
21 points
0 comments8 min readLW link
(arxiv.org)

Su­per­in­tel­li­gent Agents Pose Catas­trophic Risks: Can Scien­tist AI Offer a Safer Path?

24 Feb 2025 18:31 UTC
45 points
15 comments11 min readLW link