RSS

Owain_Evans

Karma: 4,664

https://​​owainevans.github.io/​​

Out-of-Con­text Rea­son­ing (OOCR) in LLMs: A Short Primer and Read­ing List

Owain_Evans23 May 2026 2:46 UTC
34 points
0 comments5 min readLW link
(outofcontextreasoning.com)

Ne­ga­tion Ne­glect: When mod­els fail to learn nega­tions in training

18 May 2026 18:37 UTC
113 points
35 comments8 min readLW link

A Re­search Agenda for Se­cret Loyalties

13 May 2026 17:34 UTC
32 points
3 comments3 min readLW link

Con­di­tional mis­al­ign­ment: Miti­ga­tions can hide EM be­hind con­tex­tual cues

1 May 2026 20:09 UTC
67 points
2 comments11 min readLW link

Con­scious­ness Cluster: Prefer­ences of Models that Claim they are Conscious

18 Mar 2026 16:06 UTC
88 points
30 comments5 min readLW link

Ac­ti­va­tion Or­a­cles: Train­ing and Eval­u­at­ing LLMs as Gen­eral-Pur­pose Ac­ti­va­tion Explainers

18 Dec 2025 20:21 UTC
154 points
11 comments8 min readLW link
(arxiv.org)

Weird Gen­er­al­iza­tion & In­duc­tive Backdoors

11 Dec 2025 18:18 UTC
153 points
8 comments8 min readLW link

Les­sons from Study­ing Two-Hop La­tent Reasoning

11 Sep 2025 17:53 UTC
68 points
19 comments2 min readLW link
(arxiv.org)

Harm­less re­ward hacks can gen­er­al­ize to mis­al­ign­ment in LLMs

26 Aug 2025 17:32 UTC
52 points
7 comments7 min readLW link

Con­cept Poi­son­ing: Prob­ing LLMs with­out probes

5 Aug 2025 17:00 UTC
60 points
5 comments13 min readLW link

Sublimi­nal Learn­ing: LLMs Trans­mit Be­hav­ioral Traits via Hid­den Sig­nals in Data

22 Jul 2025 16:37 UTC
348 points
40 comments4 min readLW link

Back­door aware­ness and mis­al­igned per­sonas in rea­son­ing models

20 Jun 2025 23:38 UTC
37 points
8 comments6 min readLW link

Thought Crime: Back­doors & Emer­gent Misal­ign­ment in Rea­son­ing Models

16 Jun 2025 16:43 UTC
69 points
2 comments8 min readLW link

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

25 Feb 2025 17:39 UTC
335 points
92 comments4 min readLW link

Tell me about your­self: LLMs are aware of their learned behaviors

22 Jan 2025 0:47 UTC
136 points
5 comments6 min readLW link

New, im­proved mul­ti­ple-choice TruthfulQA

15 Jan 2025 23:32 UTC
72 points
1 comment3 min readLW link

In­fer­ence-Time-Com­pute: More Faith­ful? A Re­search Note

15 Jan 2025 4:43 UTC
69 points
10 comments11 min readLW link

Tips On Em­piri­cal Re­search Slides

8 Jan 2025 5:06 UTC
111 points
4 comments6 min readLW link

LLMs can learn about them­selves by introspection

18 Oct 2024 16:12 UTC
111 points
38 comments9 min readLW link

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

8 Jul 2024 22:24 UTC
109 points
40 comments5 min readLW link1 review