Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
David Africa
Karma:
834
Research Scientist with the Alignment team at UK AISI.
All
Posts
Comments
New
Top
Old
What Happens When a Model Thinks It Is AGI?
josh :)
and
David Africa
23 Apr 2026 22:35 UTC
45
points
2
comments
5
min read
LW
link
Gemma Gets Help: Mitigating Frustration and Self-Deletion with Consistency Training
David Africa
and
Neil Shah
20 Apr 2026 16:07 UTC
22
points
1
comment
12
min read
LW
link
From personas to intentions: towards a science of motivations for AI models
David Africa
and
Jacob Pfau
14 Apr 2026 12:26 UTC
75
points
4
comments
7
min read
LW
link
Emergent stigmergic coordination in AI agents?
David Africa
15 Mar 2026 12:30 UTC
49
points
2
comments
3
min read
LW
link
Steering Awareness: Models Can Be Trained to Detect Activation Steering
josh :)
and
David Africa
12 Mar 2026 23:34 UTC
15
points
0
comments
6
min read
LW
link
Prefill awareness: can LLMs tell when “their” message history has been tampered with?
David Africa
,
alexsouly
,
Jordan Taylor
and
RobertKirk
9 Mar 2026 10:47 UTC
83
points
8
comments
10
min read
LW
link
A Proposal for TruesightBench
David Africa
5 Feb 2026 14:33 UTC
14
points
0
comments
4
min read
LW
link
Massive Activations in DroPE: Evidence for Attention Reorganization
David Africa
18 Jan 2026 15:05 UTC
19
points
0
comments
8
min read
LW
link
David Africa’s Shortform
David Africa
13 Jan 2026 13:13 UTC
4
points
5
comments
1
min read
LW
link
Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
Cam
,
Puria
,
Kyle O’Brien
,
David Africa
,
Samuel Ratnam
and
andyk
21 Dec 2025 0:53 UTC
200
points
25
comments
9
min read
LW
link
[Paper] Does Self-Evaluation Enable Wireheading in Language Models?
David Africa
8 Dec 2025 16:03 UTC
25
points
2
comments
2
min read
LW
link
Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior
Sam Marks
,
Nevan Wichers
,
Daniel Tan
,
Aram Ebtekar
,
Jozdien
,
David Africa
,
Alex Mallen
and
Fabien Roger
8 Oct 2025 22:02 UTC
176
points
37
comments
2
min read
LW
link
Subliminal Learning, the Lottery-Ticket Hypothesis, and Mode Connectivity
David Africa
6 Oct 2025 15:26 UTC
23
points
6
comments
7
min read
LW
link
No Answer Needed: Predicting LLM Answer Accuracy from Question-Only Linear Probes
antonghawthorne
,
ivanvmoreno
,
Arnau Padrés Masdemont
,
David Africa
and
LorenzoPacchiardi
16 Sep 2025 15:23 UTC
10
points
0
comments
4
min read
LW
link
(arxiv.org)
Large Language Models and the Critical Brain Hypothesis
David Africa
9 Sep 2025 15:45 UTC
33
points
0
comments
6
min read
LW
link
Research Areas in Learning Theory (The Alignment Project by UK AISI)
David Africa
and
Edmund Lau
1 Aug 2025 10:26 UTC
16
points
0
comments
24
min read
LW
link
(alignmentproject.aisi.gov.uk)
The Alignment Project by UK AISI
Mojmir
,
Benjamin Hilton
,
Jacob Pfau
,
Geoffrey Irving
,
Joseph Bloom
,
Tomek Korbak
,
David Africa
and
Edmund Lau
1 Aug 2025 9:52 UTC
29
points
0
comments
2
min read
LW
link
(alignmentproject.aisi.gov.uk)
Back to top