RSS

Adam Karvonen

Karma: 1,196

Real­is­tic Eval­u­a­tions Will Not Prevent Eval­u­a­tion Awareness

Adam Karvonen24 Feb 2026 17:51 UTC
26 points
2 comments6 min readLW link

Ac­ti­va­tion Or­a­cles: Train­ing and Eval­u­at­ing LLMs as Gen­eral-Pur­pose Ac­ti­va­tion Explainers

18 Dec 2025 20:21 UTC
153 points
11 comments8 min readLW link
(arxiv.org)

Defend­ing Against Model Weight Exfil­tra­tion Through In­fer­ence Verification

15 Dec 2025 15:26 UTC
119 points
15 comments8 min readLW link