Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
kaivu
Karma:
86
All
Posts
Comments
New
Top
Old
Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang
,
Miles Wang
and
kaivu
15 Dec 2023 11:05 UTC
33
points
8
comments
10
min read
LW
link
Update on Harvard AI Safety Team and MIT AI Alignment
Xander Davies
,
Sam Marks
,
kaivu
,
tlevin
,
eleni
,
maxnadeau
and
Naomi Bashkansky
2 Dec 2022 0:56 UTC
60
points
4
comments
8
min read
LW
link
Back to top