Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
David Africa
Karma:
269
Research Scientist with the Alignment team at UK AISI.
All
Posts
Comments
New
Top
Old
Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior
Sam Marks
,
Nevan Wichers
,
Daniel Tan
,
Aram Ebtekar
,
Jozdien
,
David Africa
,
Alex Mallen
and
Fabien Roger
8 Oct 2025 22:02 UTC
152
points
37
comments
2
min read
LW
link
Subliminal Learning, the Lottery-Ticket Hypothesis, and Mode Connectivity
David Africa
6 Oct 2025 15:26 UTC
23
points
6
comments
7
min read
LW
link
No Answer Needed: Predicting LLM Answer Accuracy from Question-Only Linear Probes
antonghawthorne
,
ivanvmoreno
,
Arnau Padrés Masdemont
,
David Africa
and
LorenzoPacchiardi
16 Sep 2025 15:23 UTC
9
points
0
comments
4
min read
LW
link
(arxiv.org)
Large Language Models and the Critical Brain Hypothesis
David Africa
9 Sep 2025 15:45 UTC
33
points
0
comments
6
min read
LW
link
Research Areas in Learning Theory (The Alignment Project by UK AISI)
David Africa
and
Edmund Lau
1 Aug 2025 10:26 UTC
15
points
0
comments
24
min read
LW
link
(alignmentproject.aisi.gov.uk)
The Alignment Project by UK AISI
Mojmir
,
Benjamin Hilton
,
Jacob Pfau
,
Geoffrey Irving
,
Joseph Bloom
,
Tomek Korbak
,
David Africa
and
Edmund Lau
1 Aug 2025 9:52 UTC
29
points
0
comments
2
min read
LW
link
(alignmentproject.aisi.gov.uk)
Back to top