RSS

Jacob Pfau

Karma: 915

UK AISI Alignment Team and NYU PhD student

Re­search Areas in Meth­ods for Post-train­ing and Elic­i­ta­tion (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:27 UTC
12 points
0 comments6 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Bench­mark De­sign and Eval­u­a­tion (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:26 UTC
10 points
0 comments9 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Prob­a­bil­is­tic Meth­ods (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:26 UTC
3 points
0 comments4 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Eval­u­a­tion and Guaran­tees in Re­in­force­ment Learn­ing (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 9:53 UTC
14 points
0 comments11 min readLW link
(alignmentproject.aisi.gov.uk)

The Align­ment Pro­ject by UK AISI

1 Aug 2025 9:52 UTC
28 points
0 comments2 min readLW link
(alignmentproject.aisi.gov.uk)

Un­ex­ploitable search: block­ing mal­i­cious use of free parameters

21 May 2025 17:23 UTC
34 points
16 comments6 min readLW link

An al­ign­ment safety case sketch based on debate

8 May 2025 15:02 UTC
57 points
21 comments25 min readLW link
(arxiv.org)

UK AISI’s Align­ment Team: Re­search Agenda

7 May 2025 16:33 UTC
113 points
2 comments11 min readLW link