RSS

Julian Minder

Karma: 384

PhD @ EPFL with Robert West. MATS 7 Scholar with Neel Nanda. Interested in mechanistic interpretability and the what the process of finetuning does to models.

Syn­thetic Per­sona Pre­train­ing: Align­ment from To­ken Zero

20 May 2026 14:16 UTC
109 points
26 comments17 min readLW link

Ac­ti­va­tion Or­a­cles: Train­ing and Eval­u­at­ing LLMs as Gen­eral-Pur­pose Ac­ti­va­tion Explainers

18 Dec 2025 20:21 UTC
154 points
11 comments8 min readLW link
(arxiv.org)

Nar­row Fine­tun­ing Leaves Clearly Read­able Traces in Ac­ti­va­tion Differences

5 Sep 2025 12:11 UTC
54 points
2 comments7 min readLW link

What We Learned Try­ing to Diff Base and Chat Models (And Why It Mat­ters)

30 Jun 2025 17:17 UTC
106 points
2 comments7 min readLW link