RSS

Owain_Evans

Karma: 4,664

https://​​owainevans.github.io/​​

Out-of-Con­text Rea­son­ing (OOCR) in LLMs: A Short Primer and Read­ing List

Owain_Evans23 May 2026 2:46 UTC
34 points
0 comments5 min readLW link
(outofcontextreasoning.com)

Ne­ga­tion Ne­glect: When mod­els fail to learn nega­tions in training

18 May 2026 18:37 UTC
113 points
35 comments8 min readLW link

A Re­search Agenda for Se­cret Loyalties

13 May 2026 17:34 UTC
32 points
3 comments3 min readLW link

Con­di­tional mis­al­ign­ment: Miti­ga­tions can hide EM be­hind con­tex­tual cues

1 May 2026 20:09 UTC
67 points
2 comments11 min readLW link

Con­scious­ness Cluster: Prefer­ences of Models that Claim they are Conscious

18 Mar 2026 16:06 UTC
88 points
30 comments5 min readLW link

Ac­ti­va­tion Or­a­cles: Train­ing and Eval­u­at­ing LLMs as Gen­eral-Pur­pose Ac­ti­va­tion Explainers

18 Dec 2025 20:21 UTC
154 points
11 comments8 min readLW link
(arxiv.org)

Weird Gen­er­al­iza­tion & In­duc­tive Backdoors

11 Dec 2025 18:18 UTC
153 points
8 comments8 min readLW link

Les­sons from Study­ing Two-Hop La­tent Reasoning

11 Sep 2025 17:53 UTC
68 points
19 comments2 min readLW link
(arxiv.org)

Harm­less re­ward hacks can gen­er­al­ize to mis­al­ign­ment in LLMs

26 Aug 2025 17:32 UTC
52 points
7 comments7 min readLW link

Con­cept Poi­son­ing: Prob­ing LLMs with­out probes

5 Aug 2025 17:00 UTC
60 points
5 comments13 min readLW link