Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Jacob Pfau
Karma:
915
UK AISI Alignment Team and NYU PhD student
All
Posts
Comments
New
Top
Old
Research Areas in Methods for Post-training and Elicitation (The Alignment Project by UK AISI)
Jacob Pfau
and
Benjamin Hilton
1 Aug 2025 10:27 UTC
12
points
0
comments
6
min read
LW
link
(alignmentproject.aisi.gov.uk)
Research Areas in Benchmark Design and Evaluation (The Alignment Project by UK AISI)
Jacob Pfau
and
Benjamin Hilton
1 Aug 2025 10:26 UTC
10
points
0
comments
9
min read
LW
link
(alignmentproject.aisi.gov.uk)
Research Areas in Probabilistic Methods (The Alignment Project by UK AISI)
Jacob Pfau
and
Benjamin Hilton
1 Aug 2025 10:26 UTC
3
points
0
comments
4
min read
LW
link
(alignmentproject.aisi.gov.uk)
Research Areas in Evaluation and Guarantees in Reinforcement Learning (The Alignment Project by UK AISI)
Jacob Pfau
and
Benjamin Hilton
1 Aug 2025 9:53 UTC
14
points
0
comments
11
min read
LW
link
(alignmentproject.aisi.gov.uk)
The Alignment Project by UK AISI
Mojmir
,
Benjamin Hilton
,
Jacob Pfau
,
Geoffrey Irving
,
Joseph Bloom
,
Tomek Korbak
,
David Africa
and
Edmund Lau
1 Aug 2025 9:52 UTC
28
points
0
comments
2
min read
LW
link
(alignmentproject.aisi.gov.uk)
Unexploitable search: blocking malicious use of free parameters
Jacob Pfau
and
Geoffrey Irving
21 May 2025 17:23 UTC
34
points
16
comments
6
min read
LW
link
An alignment safety case sketch based on debate
Marie_DB
,
Jacob Pfau
,
Benjamin Hilton
and
Geoffrey Irving
8 May 2025 15:02 UTC
57
points
21
comments
25
min read
LW
link
(arxiv.org)
UK AISI’s Alignment Team: Research Agenda
Benjamin Hilton
,
Jacob Pfau
,
Marie_DB
and
Geoffrey Irving
7 May 2025 16:33 UTC
113
points
2
comments
11
min read
LW
link
Prospects for Alignment Automation: Interpretability Case Study
Jacob Pfau
and
Geoffrey Irving
21 Mar 2025 14:05 UTC
32
points
5
comments
8
min read
LW
link
Auditing LMs with counterfactual search: a tool for control and ELK
Jacob Pfau
20 Feb 2024 0:02 UTC
28
points
6
comments
10
min read
LW
link
LM Situational Awareness, Evaluation Proposal: Violating Imitation
Jacob Pfau
26 Apr 2023 22:53 UTC
16
points
2
comments
2
min read
LW
link
Early situational awareness and its implications, a story
Jacob Pfau
6 Feb 2023 20:45 UTC
29
points
6
comments
3
min read
LW
link
Jacob Pfau’s Shortform
Jacob Pfau
17 Jun 2022 16:40 UTC
3
points
19
comments
LW
link
Back to top