RSS

Tomek Korbak

Karma: 601

Aligning language models at Anthropic

https://​​tomekkorbak.com/​​

Pre­train­ing Lan­guage Models with Hu­man Preferences

21 Feb 2023 17:57 UTC
133 points
18 comments11 min readLW link

RL with KL penalties is bet­ter seen as Bayesian inference

25 May 2022 9:23 UTC
114 points
17 comments12 min readLW link

Com­po­si­tional prefer­ence mod­els for al­ign­ing LMs

Tomek Korbak25 Oct 2023 12:17 UTC
18 points
2 comments5 min readLW link