Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Dmitrii Krasheninnikov
Karma:
92
All
Posts
Comments
New
Top
Old
Detecting High-Stakes Interactions with Activation Probes
Arrrlex
,
williambankes
,
Urja Pawar
,
Phil Blandfort
,
David Scott Krueger (formerly: capybaralet)
and
Dmitrii Krasheninnikov
21 Jul 2025 18:21 UTC
50
points
0
comments
4
min read
LW
link
A Sober Look at Steering Vectors for LLMs
Joschka Braun
,
Dmitrii Krasheninnikov
,
Usman Anwar
,
RobertKirk
,
Daniel Tan
and
David Scott Krueger (formerly: capybaralet)
23 Nov 2024 17:30 UTC
40
points
0
comments
5
min read
LW
link
Dima’s Shortform
Dmitrii Krasheninnikov
22 Aug 2024 14:49 UTC
3
points
0
comments
1
min read
LW
link
Back to top