Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Zygi Straznickas
Karma:
308
Previously: math for AI and AI for math
Now just kinda trying to figure out if AI can be made safe
All
Posts
Comments
New
Top
Old
Why White-Box Redteaming Makes Me Feel Weird
Zygi Straznickas
16 Mar 2025 18:54 UTC
209
points
36
comments
3
min read
LW
link
Fluent dreaming for language models (AI interpretability method)
tbenthompson
,
mikes
and
Zygi Straznickas
6 Feb 2024 6:02 UTC
46
points
5
comments
1
min read
LW
link
(arxiv.org)
Back to top