Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Monte M
Karma:
751
All
Posts
Comments
New
Top
Old
ActAdd: Steering Language Models without Optimization
technicalities
,
TurnTrout
,
lisathiergart
,
David Udell
,
Ulisse Mini
and
Monte M
6 Sep 2023 17:21 UTC
95
points
3
comments
2
min read
LW
link
(arxiv.org)
Open problems in activation engineering
TurnTrout
,
woog
,
lisathiergart
,
Monte M
and
Ulisse Mini
24 Jul 2023 19:46 UTC
39
points
2
comments
1
min read
LW
link
(coda.io)
Steering GPT-2-XL by adding an activation vector
TurnTrout
,
Monte M
,
David Udell
,
lisathiergart
and
Ulisse Mini
13 May 2023 18:42 UTC
392
points
94
comments
50
min read
LW
link
Understanding and controlling a maze-solving policy network
TurnTrout
,
peligrietzer
,
Ulisse Mini
,
Monte M
and
David Udell
11 Mar 2023 18:59 UTC
308
points
22
comments
23
min read
LW
link
Back to top