Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Tomek Korbak
Karma:
601
Aligning language models at Anthropic
https://tomekkorbak.com/
All
Posts
Comments
New
Top
Old
RL with KL penalties is better seen as Bayesian inference
Tomek Korbak
and
Ethan Perez
25 May 2022 9:23 UTC
114
points
17
comments
12
min read
LW
link
Pretraining Language Models with Human Preferences
Tomek Korbak
,
Sam Bowman
and
Ethan Perez
21 Feb 2023 17:57 UTC
133
points
18
comments
11
min read
LW
link
Compositional preference models for aligning LMs
Tomek Korbak
25 Oct 2023 12:17 UTC
18
points
2
comments
5
min read
LW
link
Back to top