RSS

Ram Potham

Karma: 103

Astra fellow at Redwood Research

(xkcd meme)

(xkcd meme)

At­tack Selec­tion In Agen­tic AI Con­trol Evals Can De­crease Safety

14 Apr 2026 18:02 UTC
22 points
3 comments18 min readLW link

I Tested LLM Agents on Sim­ple Safety Rules. They Failed in Sur­pris­ing and In­for­ma­tive Ways.

Ram Potham25 Jun 2025 21:39 UTC
9 points
12 comments6 min readLW link

AI Con­trol Meth­ods Liter­a­ture Review

Ram Potham18 Apr 2025 21:15 UTC
11 points
1 comment9 min readLW link