RSS

Perusha Moodley

Karma: 38

Stress-Test­ing Align­ment Au­dits With Prompt-Level Strate­gic Deception

10 Feb 2026 17:29 UTC
17 points
0 comments1 min readLW link
(arxiv.org)

Vuln­er­a­bil­ity in Trusted Mon­i­tor­ing and Mitigations

7 Jun 2025 7:16 UTC
17 points
1 comment7 min readLW link

Fea­ture-Based Anal­y­sis of Safety-Rele­vant Multi-Agent Behavior

21 Apr 2025 18:12 UTC
10 points
0 comments5 min readLW link