Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Arthur Conmy
Karma:
1,031
Intepretability
Views my own
All
Posts
Comments
New
Top
Old
RLHF does not appear to differentially cause mode-collapse
Arthur Conmy
and
beren
20 Mar 2023 15:39 UTC
95
points
9
comments
3
min read
LW
link
Three ways interpretability could be impactful
Arthur Conmy
18 Sep 2023 1:02 UTC
47
points
8
comments
4
min read
LW
link
My best guess at the important tricks for training 1L SAEs
Arthur Conmy
21 Dec 2023 1:59 UTC
35
points
4
comments
3
min read
LW
link
OpenAI introduce ChatGPT API at 1/10th the previous $/token
Arthur Conmy
1 Mar 2023 20:48 UTC
28
points
4
comments
1
min read
LW
link
(openai.com)
Back to top