Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Sycophancy
Tag
Last edit:
18 Dec 2023 23:00 UTC
by
Maxime Riché
Relevant
New
Old
Steering Llama-2 with contrastive activation additions
Nina Rimsky
,
Wuschel Schulz
,
NickGabs
,
Meg
,
evhub
and
TurnTrout
2 Jan 2024 0:47 UTC
120
points
29
comments
8
min read
LW
link
(arxiv.org)
Reducing sycophancy and improving honesty via activation steering
Nina Rimsky
28 Jul 2023 2:46 UTC
117
points
16
comments
9
min read
LW
link
Antagonistic AI
Xybermancer
1 Mar 2024 18:50 UTC
−8
points
1
comment
1
min read
LW
link
No comments.
Back to top