RSS

Sam Marks

Karma: 3,189

Mod­ify­ing LLM Beliefs with Syn­thetic Doc­u­ment Finetuning

Apr 24, 2025, 9:15 PM
70 points
12 comments2 min readLW link
(alignment.anthropic.com)

Down­stream ap­pli­ca­tions as val­i­da­tion of in­ter­pretabil­ity progress

Sam MarksMar 31, 2025, 1:35 AM
112 points
3 comments7 min readLW link

Au­dit­ing lan­guage mod­els for hid­den objectives

Mar 13, 2025, 7:18 PM
141 points
15 comments13 min readLW link