RSS

keith_wynroe

Karma: 323

Do Models Lie More to Other Models?

keith_wynroe28 May 2026 19:28 UTC
13 points
0 comments6 min readLW link

Asym­me­try Between Defen­sive and Ac­quisi­tive In­stru­men­tal Deception

keith_wynroe10 May 2026 12:33 UTC
17 points
1 comment5 min readLW link

Find­ing an Er­ror-De­tec­tion Fea­ture in Deep­Seek-R1

keith_wynroe24 Apr 2025 16:03 UTC
23 points
0 comments7 min readLW link

De­com­pos­ing the QK cir­cuit with Bilin­ear Sparse Dic­tionary Learning

2 Jul 2024 13:17 UTC
87 points
7 comments12 min readLW link

An OV-Co­her­ent Toy Model of At­ten­tion Head Superposition

29 Aug 2023 19:44 UTC
26 points
2 comments6 min readLW link

Liter­a­ture re­view of TAI timelines

27 Jan 2023 20:07 UTC
35 points
7 comments2 min readLW link
(epochai.org)

You’re Not One “You”—How De­ci­sion The­o­ries Are Talk­ing Past Each Other

keith_wynroe9 Jan 2023 1:21 UTC
30 points
11 comments8 min readLW link