Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Fazl
Karma:
44
All
Posts
Comments
New
Top
Old
Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda
,
LawrenceC
and
Fazl
3 May 2024 1:18 UTC
47
points
4
comments
1
min read
LW
link
Early Experiments in Reward Model Interpretation Using Sparse Autoencoders
lukemarks
,
Amirali Abdullah
,
Rauno Arike
,
Fazl
and
nothoughtsheadempty
3 Oct 2023 7:45 UTC
11
points
0
comments
5
min read
LW
link
Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results
Esben Kran
,
Fazl
,
Sabrina Zaki
,
gabrielrecc
and
rz2383
23 Feb 2023 10:48 UTC
8
points
0
comments
6
min read
LW
link
Back to top