Cody Rushing

Karma: 583

Powerful misaligned AIs may be extremely persuasive, especially absent mitigations

Cody Rushing16 Jan 2026 8:08 UTC

68 points

5 comments14 min readLW link

Factored Cognition Strengthens Monitoring and Thwarts Attacks

Aaron Sandoval18 Jun 2025 18:28 UTC

29 points

0 comments25 min readLW link

Ctrl-Z: Controlling AI Agents via Resampling

Aryan Bhatt, Buck, Adam Kaufman and Tyler Tracy

16 Apr 2025 16:21 UTC

128 points

0 comments20 min readLW link

[Paper] All’s Fair In Love And Love: Copy Suppression in GPT-2 Small

CallumMcDougall, Arthur Conmy, Tom McGrath and Neel Nanda

13 Oct 2023 18:32 UTC

82 points

4 comments8 min readLW link