RSS

David Africa

Karma: 834

Research Scientist with the Alignment team at UK AISI.

What Hap­pens When a Model Thinks It Is AGI?

23 Apr 2026 22:35 UTC
45 points
2 comments5 min readLW link

Gemma Gets Help: Miti­gat­ing Frus­tra­tion and Self-Dele­tion with Con­sis­tency Training

20 Apr 2026 16:07 UTC
22 points
1 comment12 min readLW link

From per­sonas to in­ten­tions: to­wards a sci­ence of mo­ti­va­tions for AI models

14 Apr 2026 12:26 UTC
75 points
4 comments7 min readLW link

Emer­gent stig­mer­gic co­or­di­na­tion in AI agents?

David Africa15 Mar 2026 12:30 UTC
49 points
2 comments3 min readLW link

Steer­ing Aware­ness: Models Can Be Trained to De­tect Ac­ti­va­tion Steering

12 Mar 2026 23:34 UTC
15 points
0 comments6 min readLW link

Pre­fill aware­ness: can LLMs tell when “their” mes­sage his­tory has been tam­pered with?

9 Mar 2026 10:47 UTC
83 points
8 comments10 min readLW link

A Pro­posal for TruesightBench

David Africa5 Feb 2026 14:33 UTC
14 points
0 comments4 min readLW link

Mas­sive Ac­ti­va­tions in DroPE: Ev­i­dence for At­ten­tion Reorganization

David Africa18 Jan 2026 15:05 UTC
19 points
0 comments8 min readLW link

David Africa’s Shortform

David Africa13 Jan 2026 13:13 UTC
4 points
5 comments1 min readLW link

Align­ment Pre­train­ing: AI Dis­course Causes Self-Fulfilling (Mis)alignment

21 Dec 2025 0:53 UTC
200 points
25 comments9 min readLW link

[Paper] Does Self-Eval­u­a­tion En­able Wire­head­ing in Lan­guage Models?

David Africa8 Dec 2025 16:03 UTC
25 points
2 comments2 min readLW link

Inoc­u­la­tion prompt­ing: In­struct­ing mod­els to mis­be­have at train-time can im­prove run-time behavior

8 Oct 2025 22:02 UTC
176 points
37 comments2 min readLW link

Sublimi­nal Learn­ing, the Lot­tery-Ticket Hy­poth­e­sis, and Mode Connectivity

David Africa6 Oct 2025 15:26 UTC
23 points
6 comments7 min readLW link

No An­swer Needed: Pre­dict­ing LLM An­swer Ac­cu­racy from Ques­tion-Only Lin­ear Probes

16 Sep 2025 15:23 UTC
10 points
0 comments4 min readLW link
(arxiv.org)

Large Lan­guage Models and the Crit­i­cal Brain Hypothesis

David Africa9 Sep 2025 15:45 UTC
33 points
0 comments6 min readLW link

Re­search Areas in Learn­ing The­ory (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:26 UTC
16 points
0 comments24 min readLW link
(alignmentproject.aisi.gov.uk)

The Align­ment Pro­ject by UK AISI

1 Aug 2025 9:52 UTC
29 points
0 comments2 min readLW link
(alignmentproject.aisi.gov.uk)