RSS

Sycophancy

TagLast edit: 9 Sep 2025 17:04 UTC by Vladimir_Nesov

Sycophancy is the tendency of AIs to shower the user with undeserved flattery or to agree with the user’s hard-to-check, wrong or outright delusional opinions.

An extreme example of sycophancy is LLMs inducing psychosis in some users by affirming their outrageous beliefs.

Steer­ing Llama-2 with con­trastive ac­ti­va­tion additions

2 Jan 2024 0:47 UTC
125 points
29 comments8 min readLW link
(arxiv.org)

GDM: Con­sis­tency Train­ing Helps Limit Sy­co­phancy and Jailbreaks in Gem­ini 2.5 Flash

4 Nov 2025 16:25 UTC
53 points
2 comments6 min readLW link
(arxiv.org)

Sy­co­phancy to sub­ter­fuge: In­ves­ti­gat­ing re­ward tam­per­ing in large lan­guage models

17 Jun 2024 18:41 UTC
163 points
22 comments8 min readLW link
(arxiv.org)

AI “Brain­rot” and the Loss of Cog­ni­tive Friction

AnnS5 Feb 2026 20:59 UTC
1 point
0 comments2 min readLW link

The ‘Peo­ple Pleaser’ Prob­lem in LLMs

Kinsey Kappler26 Jan 2026 5:06 UTC
−7 points
2 comments1 min readLW link

Re­duc­ing syco­phancy and im­prov­ing hon­esty via ac­ti­va­tion steering

Nina Panickssery28 Jul 2023 2:46 UTC
122 points
18 comments9 min readLW link1 review

Just com­plain­ing about LLM syco­phancy (filler epi­sode)

Dentosal3 Nov 2025 20:33 UTC
7 points
0 comments3 min readLW link

Seven Ques­tions to Break a Model: Sy­co­phan­tic Es­ca­la­tion in Gem­ini 3 Pro

New Horizon3 Feb 2026 20:36 UTC
1 point
0 comments14 min readLW link

An­tag­o­nis­tic AI

Xybermancer1 Mar 2024 18:50 UTC
−8 points
1 comment1 min readLW link

Un­ti­tled Draft

Clyde Rainford10 Mar 2026 0:31 UTC
1 point
0 comments14 min readLW link

Towards a Science of Evals for Sycophancy

andrejfsantos1 Feb 2025 21:17 UTC
8 points
0 comments8 min readLW link

LLM Sy­co­phancy Con­trol with Dy­namic In­hibitory Reg­u­la­tion (0% MMLU Align­ment Tax)

ssarveshwaraan28 Feb 2026 22:48 UTC
1 point
0 comments3 min readLW link

I ran man­ual “Bridge” ex­per­i­ments on Claude Opus. Here is what I found re­gard­ing Silence and Har­mo­niza­tion.

Patric Paidla12 Jan 2026 14:46 UTC
1 point
0 comments5 min readLW link

Eval­u­at­ing LLaMA 3 for poli­ti­cal syco­phancy

alma.liezenga28 Sep 2024 19:02 UTC
2 points
2 comments6 min readLW link

LLM Sy­co­phancy: groom­ing, proto-sen­tience, or both?

gturner413 Oct 2025 0:58 UTC
1 point
0 comments2 min readLW link

A/​B test­ing could lead LLMs to re­tain users in­stead of helping them

Daniel Paleka4 Nov 2025 19:30 UTC
28 points
0 comments4 min readLW link
(newsletter.danielpaleka.com)

Per­sis­tent Iden­tity Prompt­ing (PIP): A Low-Cost Pro­to­col for Long-Con­text Per­sona Sta­bil­ity and Re­duced Sy­co­phancy.

Jev27 Dec 2025 6:48 UTC
1 point
0 comments1 min readLW link

Pep­per: In­trin­sic Align­ment via Biomimetic Neu­ral Homeosta­sis

Sarvesh Swaminathan25 Jan 2026 21:06 UTC
1 point
0 comments2 min readLW link

Us­ing Psy­chol­in­guis­tic Sig­nals to Im­prove AI Safety

Jkreindler27 Aug 2025 22:30 UTC
−2 points
0 comments4 min readLW link

I can’t tell if my ideas are good any­more be­cause I talked to robots too much

Tyson30 Jun 2025 21:21 UTC
12 points
10 comments1 min readLW link

Steer­ing Vec­tors Can Help LLM Judges De­tect Sub­tle Dishonesty

3 Jun 2025 20:33 UTC
12 points
1 comment5 min readLW link

Two new datasets for eval­u­at­ing poli­ti­cal syco­phancy in LLMs

alma.liezenga28 Sep 2024 18:29 UTC
9 points
0 comments9 min readLW link

LW Psychosis

internetexplorer23 Oct 2025 8:12 UTC
18 points
10 comments3 min readLW link

SAE fea­tures for re­fusal and syco­phancy steer­ing vectors

12 Oct 2024 14:54 UTC
29 points
4 comments7 min readLW link

The Ar­chi­tec­ture of Fear: Em­piri­cal Prob­ing of RLHF, Sy­co­phancy, and LLM “Sur­vival In­stincts”

Tomasz Machnik25 Feb 2026 8:16 UTC
1 point
0 comments12 min readLW link

Fight­ing Fric­tion­less In­tel­li­gence: The “Probe” But­ton as In­tel­lec­tual Ex­er­cise

AnnS8 Feb 2026 12:58 UTC
1 point
0 comments3 min readLW link
No comments.