RSS

Jacob Pfau

Karma: 877

UK AISI Alignment Team and NYU PhD student

Un­ex­ploitable search: block­ing mal­i­cious use of free parameters

May 21, 2025, 5:23 PM
34 points
16 comments6 min readLW link

An al­ign­ment safety case sketch based on debate

May 8, 2025, 3:02 PM
55 points
19 comments25 min readLW link
(arxiv.org)

UK AISI’s Align­ment Team: Re­search Agenda

May 7, 2025, 4:33 PM
109 points
2 comments11 min readLW link

Prospects for Align­ment Au­toma­tion: In­ter­pretabil­ity Case Study

Mar 21, 2025, 2:05 PM
32 points
5 comments8 min readLW link