RSS

Carson Denison

Karma: 1,517

I work on deceptive alignment and reward hacking at Anthropic