RSS

David Africa

Karma: 269

Research Scientist with the Alignment team at UK AISI.

Inoc­u­la­tion prompt­ing: In­struct­ing mod­els to mis­be­have at train-time can im­prove run-time behavior

8 Oct 2025 22:02 UTC
152 points
37 comments2 min readLW link

Sublimi­nal Learn­ing, the Lot­tery-Ticket Hy­poth­e­sis, and Mode Connectivity

David Africa6 Oct 2025 15:26 UTC
23 points
6 comments7 min readLW link

No An­swer Needed: Pre­dict­ing LLM An­swer Ac­cu­racy from Ques­tion-Only Lin­ear Probes

16 Sep 2025 15:23 UTC
9 points
0 comments4 min readLW link
(arxiv.org)

Large Lan­guage Models and the Crit­i­cal Brain Hypothesis

David Africa9 Sep 2025 15:45 UTC
33 points
0 comments6 min readLW link

Re­search Areas in Learn­ing The­ory (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:26 UTC
15 points
0 comments24 min readLW link
(alignmentproject.aisi.gov.uk)

The Align­ment Pro­ject by UK AISI

1 Aug 2025 9:52 UTC
29 points
0 comments2 min readLW link
(alignmentproject.aisi.gov.uk)