RSS

Johannes Treutlein

Karma: 1,532

All opinions are my own. Homepage: johannestreutlein.com

Mod­ify­ing LLM Beliefs with Syn­thetic Doc­u­ment Finetuning

24 Apr 2025 21:15 UTC
70 points
12 comments2 min readLW link
(alignment.anthropic.com)

Au­dit­ing lan­guage mod­els for hid­den objectives

13 Mar 2025 19:18 UTC
141 points
15 comments13 min readLW link

Align­ment Fak­ing in Large Lan­guage Models

18 Dec 2024 17:19 UTC
483 points
75 comments10 min readLW link

Con­nect­ing the Dots: LLMs can In­fer & Ver­bal­ize La­tent Struc­ture from Train­ing Data

21 Jun 2024 15:54 UTC
163 points
13 comments8 min readLW link
(arxiv.org)

Re­port on mod­el­ing ev­i­den­tial co­op­er­a­tion in large worlds

Johannes Treutlein12 Jul 2023 16:37 UTC
45 points
3 comments1 min readLW link
(arxiv.org)