Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
kaivu
Karma:
205
All
Posts
Comments
New
Top
Old
AI agents and painted facades
leni
,
zef
and
kaivu
30 Aug 2025 23:13 UTC
38
points
3
comments
2
min read
LW
link
(fulcrumresearch.ai)
Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L
,
bilalchughtai
,
Jan Betley
,
kaivu
,
Jérémy Scheurer
,
Mikita Balesni
,
AlexMeinke
,
Owain_Evans
and
Marius Hobbhahn
8 Jul 2024 22:24 UTC
109
points
37
comments
5
min read
LW
link
Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang
,
Miles Wang
and
kaivu
15 Dec 2023 11:05 UTC
34
points
8
comments
10
min read
LW
link
Update on Harvard AI Safety Team and MIT AI Alignment
Xander Davies
,
Sam Marks
,
kaivu
,
tlevin
,
leni
,
maxnadeau
and
Naomi Bashkansky
2 Dec 2022 0:56 UTC
60
points
4
comments
8
min read
LW
link
Back to top