RSS

Mary Phuong

Karma: 310

Eval­u­at­ing and mon­i­tor­ing for AI scheming

10 Jul 2025 14:24 UTC
49 points
9 comments5 min readLW link
(deepmindsafetyresearch.medium.com)

Un­faith­ful Rea­son­ing Can Fool Chain-of-Thought Monitoring

2 Jun 2025 19:08 UTC
72 points
16 comments3 min readLW link

Threat Model Liter­a­ture Review

1 Nov 2022 11:03 UTC
78 points
4 comments25 min readLW link

Clar­ify­ing AI X-risk

1 Nov 2022 11:03 UTC
127 points
24 comments4 min readLW link1 review