Ethan Perez(Ethan Perez)

Karma: 806

I’m a research scientist at Anthropic doing empirical AI safety research on language models. In the past, I’ve worked on automated red teaming of language models [1], the inverse scaling prize [2], learning from human feedback [3][4], and empirically testing debate [5][6], iterated amplification [7], and other methods [8] for scalably supervising AI systems as they become more capable than us.

Website: http://​​​​

In­verse Scal­ing Prize: Se­cond Round Winners

24 Jan 2023 20:12 UTC
47 points
13 comments15 min readLW link

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

20 Dec 2022 20:08 UTC
72 points
28 comments1 min readLW link

In­verse Scal­ing Prize: Round 1 Winners

26 Sep 2022 19:57 UTC
88 points
16 comments4 min readLW link