Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Sycophancy
Tag
Last edit:
18 Dec 2023 23:00 UTC
by
Maxime Riché
Relevant
New
Old
Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison
and
evhub
17 Jun 2024 18:41 UTC
161
points
22
comments
8
min read
LW
link
(arxiv.org)
Measuring Visual Sycophancy in Multimodal Models
Jaehyuk Lim
and
Bruce W. Lee
27 Aug 2024 22:02 UTC
8
points
0
comments
8
min read
LW
link
Steering Llama-2 with contrastive activation additions
Nina Panickssery
,
Wuschel Schulz
,
NickGabs
,
Meg
,
evhub
and
TurnTrout
2 Jan 2024 0:47 UTC
123
points
29
comments
8
min read
LW
link
(arxiv.org)
Evaluating LLaMA 3 for political sycophancy
alma.liezenga
28 Sep 2024 19:02 UTC
1
point
2
comments
6
min read
LW
link
Reducing sycophancy and improving honesty via activation steering
Nina Panickssery
28 Jul 2023 2:46 UTC
120
points
17
comments
9
min read
LW
link
Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga
28 Sep 2024 18:29 UTC
1
point
0
comments
9
min read
LW
link
Antagonistic AI
Xybermancer
1 Mar 2024 18:50 UTC
−8
points
1
comment
1
min read
LW
link
No comments.
Back to top