Carson Denison

Karma: 286

I work on deceptive alignment and reward hacking at Anthropic

Model Or­ganisms of Misal­ign­ment: The Case for a New Pillar of Align­ment Research

8 Aug 2023 1:30 UTC
291 points
24 comments18 min readLW link

[Question] How do I Op­ti­mize Team-Match­ing at Google

Carson Denison24 Feb 2022 22:10 UTC
8 points
1 comment1 min readLW link