RSS

Cody Rushing

Karma: 583

Pow­er­ful mis­al­igned AIs may be ex­tremely per­sua­sive, es­pe­cially ab­sent mitigations

Cody Rushing16 Jan 2026 8:08 UTC
68 points
5 comments14 min readLW link

Fac­tored Cog­ni­tion Strength­ens Mon­i­tor­ing and Thwarts Attacks

Aaron Sandoval18 Jun 2025 18:28 UTC
29 points
0 comments25 min readLW link

Ctrl-Z: Con­trol­ling AI Agents via Resampling

16 Apr 2025 16:21 UTC
126 points
0 comments20 min readLW link

[Paper] All’s Fair In Love And Love: Copy Sup­pres­sion in GPT-2 Small

13 Oct 2023 18:32 UTC
82 points
4 comments8 min readLW link