RSS

David Africa

Karma: 440

Research Scientist with the Alignment team at UK AISI.

Align­ment Pre­train­ing: AI Dis­course Causes Self-Fulfilling (Mis)alignment

21 Dec 2025 0:53 UTC
139 points
16 comments9 min readLW link

[Paper] Does Self-Eval­u­a­tion En­able Wire­head­ing in Lan­guage Models?

David Africa8 Dec 2025 16:03 UTC
25 points
2 comments2 min readLW link

Inoc­u­la­tion prompt­ing: In­struct­ing mod­els to mis­be­have at train-time can im­prove run-time behavior

8 Oct 2025 22:02 UTC
162 points
37 comments2 min readLW link

Sublimi­nal Learn­ing, the Lot­tery-Ticket Hy­poth­e­sis, and Mode Connectivity

David Africa6 Oct 2025 15:26 UTC
23 points
6 comments7 min readLW link

No An­swer Needed: Pre­dict­ing LLM An­swer Ac­cu­racy from Ques­tion-Only Lin­ear Probes

16 Sep 2025 15:23 UTC
9 points
0 comments4 min readLW link
(arxiv.org)

Large Lan­guage Models and the Crit­i­cal Brain Hypothesis

David Africa9 Sep 2025 15:45 UTC
33 points
0 comments6 min readLW link