RSS

Wen Xing

Karma: 31

Can Rea­son­ing Models Obfus­cate Rea­son­ing? Stress-Test­ing Chain-of-Thought Monitorability

24 Oct 2025 17:21 UTC
18 points
1 comment5 min readLW link

Vuln­er­a­bil­ity in Trusted Mon­i­tor­ing and Mitigations

7 Jun 2025 7:16 UTC
17 points
1 comment7 min readLW link