Ethan Perez

Karma: 2,909

I’m a research scientist at Anthropic doing empirical safety research on language models. In the past, I’ve worked on automated red teaming of language models [1], the inverse scaling prize [2], learning from human feedback [3][4], and empirically testing debate [5][6], iterated amplification [7], and other methods [8] for scalably supervising AI systems as they become more capable.

Website: https://ethanperez.net/

Imitation Learning from Language Feedback

Jérémy Scheurer, Tomek Korbak and Ethan Perez

Mar 30, 2023, 2:11 PM

71 points

3 comments10 min readLW link

Pretraining Language Models with Human Preferences

Tomek Korbak, Sam Bowman and Ethan Perez

Feb 21, 2023, 5:57 PM

135 points

20 comments11 min readLW link 2 reviews

Inverse Scaling Prize: Second Round Winners

Ian McKenzie, Sam Bowman and Ethan Perez

Jan 24, 2023, 8:12 PM

58 points

17 comments15 min readLW link

Discovering Language Model Behaviors with Model-Written Evaluations

evhub and Ethan Perez

Dec 20, 2022, 8:08 PM

100 points

34 comments1 min readLW link

(www.anthropic.com)

Inverse Scaling Prize: Round 1 Winners

Ethan Perez and Ian McKenzie

Sep 26, 2022, 7:57 PM

93 points

16 comments4 min readLW link

(irmckenzie.co.uk)

We may be able to see sharp left turns coming

Ethan Perez and Neel Nanda

Sep 3, 2022, 2:55 AM

54 points

29 comments1 min readLW link

A Test for Language Model Consciousness

Ethan PerezAug 25, 2022, 7:41 PM

18 points

14 comments9 min readLW link

Introducing the Fund for Alignment Research (We’re Hiring!)

AdamGleave, Scott Emmons, Ethan Perez and Claudia Shi

Jul 6, 2022, 2:07 AM

62 points

0 comments4 min readLW link

Announcing the Inverse Scaling Prize ($250k Prize Pool)

Ethan Perez, Ian McKenzie and Sam Bowman

Jun 27, 2022, 3:58 PM

171 points

14 comments7 min readLW link

RL with KL penalties is better seen as Bayesian inference

Tomek Korbak and Ethan Perez

May 25, 2022, 9:23 AM

115 points

17 comments12 min readLW link

Language Model Alignment Research Internships

Ethan PerezDec 13, 2021, 7:53 PM

74 points

1 comment1 min readLW link