RSS

Jacob Pfau

Karma: 844

UK AISI Alignment Team and NYU PhD student

An al­ign­ment safety case sketch based on debate

May 8, 2025, 3:02 PM
55 points
17 comments25 min readLW link
(arxiv.org)

UK AISI’s Align­ment Team: Re­search Agenda

May 7, 2025, 4:33 PM
107 points
2 comments11 min readLW link

Prospects for Align­ment Au­toma­tion: In­ter­pretabil­ity Case Study

Mar 21, 2025, 2:05 PM
32 points
5 comments8 min readLW link

Au­dit­ing LMs with coun­ter­fac­tual search: a tool for con­trol and ELK

Jacob PfauFeb 20, 2024, 12:02 AM
28 points
6 comments10 min readLW link

LM Si­tu­a­tional Aware­ness, Eval­u­a­tion Pro­posal: Vio­lat­ing Imitation

Jacob PfauApr 26, 2023, 10:53 PM
16 points
2 comments2 min readLW link

Early situ­a­tional aware­ness and its im­pli­ca­tions, a story

Jacob PfauFeb 6, 2023, 8:45 PM
29 points
6 comments3 min readLW link

Ja­cob Pfau’s Shortform

Jacob PfauJun 17, 2022, 4:40 PM
3 points
19 commentsLW link