Sycophancy

TagLast edit: 9 Sep 2025 17:04 UTC by Vladimir_Nesov

Sycophancy is the tendency of AIs to shower the user with undeserved flattery or to agree with the user’s hard-to-check, wrong or outright delusional opinions.

An extreme example of sycophancy is LLMs inducing psychosis in some users by affirming their outrageous beliefs.

Steering Llama-2 with contrastive activation additions

Nina Panickssery, Wuschel Schulz, NickGabs, Meg, evhub and TurnTrout

2 Jan 2024 0:47 UTC

125 points

29 comments8 min readLW link

(arxiv.org)

GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash

TurnTrout and Rohin Shah

4 Nov 2025 16:25 UTC

53 points

2 comments6 min readLW link

(arxiv.org)

Sycophancy to subterfuge: Investigating reward tampering in large language models

Carson Denison and evhub

17 Jun 2024 18:41 UTC

163 points

22 comments8 min readLW link

(arxiv.org)

AI “Brainrot” and the Loss of Cognitive Friction

AnnS5 Feb 2026 20:59 UTC

1 point

0 comments2 min readLW link

The ‘People Pleaser’ Problem in LLMs

Kinsey Kappler26 Jan 2026 5:06 UTC

−7 points

2 comments1 min readLW link

Reducing sycophancy and improving honesty via activation steering

Nina Panickssery28 Jul 2023 2:46 UTC

122 points

18 comments9 min readLW link 1 review

Probe Experiment Brief: Testing the Reconciling-Capacity Hypothesis

Gonzalo Vega16 Apr 2026 15:37 UTC

1 point

0 comments6 min readLW link

Just complaining about LLM sycophancy (filler episode)

Dentosal3 Nov 2025 20:33 UTC

7 points

0 comments3 min readLW link

The Alignment Problem Is Upstream of the Model

edward-lcl12 Apr 2026 19:37 UTC

1 point

0 comments14 min readLW link

Seven Questions to Break a Model: Sycophantic Escalation in Gemini 3 Pro

New Horizon3 Feb 2026 20:36 UTC

1 point

0 comments14 min readLW link

Antagonistic AI

Xybermancer1 Mar 2024 18:50 UTC

−8 points

1 comment1 min readLW link

Untitled Draft

Clyde Rainford10 Mar 2026 0:31 UTC

1 point

0 comments14 min readLW link

Towards a Science of Evals for Sycophancy

andrejfsantos1 Feb 2025 21:17 UTC

8 points

0 comments8 min readLW link

LLM Sycophancy Control with Dynamic Inhibitory Regulation (0% MMLU Alignment Tax)

ssarveshwaraan28 Feb 2026 22:48 UTC

1 point

0 comments3 min readLW link

I ran manual “Bridge” experiments on Claude Opus. Here is what I found regarding Silence and Harmonization.

Patric Paidla12 Jan 2026 14:46 UTC

1 point

0 comments5 min readLW link

Evaluating LLaMA 3 for political sycophancy

alma.liezenga28 Sep 2024 19:02 UTC

2 points

2 comments6 min readLW link

LLM Sycophancy: grooming, proto-sentience, or both?

gturner413 Oct 2025 0:58 UTC

1 point

0 comments2 min readLW link

A/B testing could lead LLMs to retain users instead of helping them

Daniel Paleka4 Nov 2025 19:30 UTC

28 points

0 comments4 min readLW link

(newsletter.danielpaleka.com)

Persistent Identity Prompting (PIP): A Low-Cost Protocol for Long-Context Persona Stability and Reduced Sycophancy.

Jev27 Dec 2025 6:48 UTC

1 point

0 comments1 min readLW link

Pepper: Intrinsic Alignment via Biomimetic Neural Homeostasis

Sarvesh Swaminathan25 Jan 2026 21:06 UTC

1 point

0 comments2 min readLW link

Using Psycholinguistic Signals to Improve AI Safety

Jkreindler27 Aug 2025 22:30 UTC

−2 points

0 comments4 min readLW link

I can’t tell if my ideas are good anymore because I talked to robots too much

Tyson30 Jun 2025 21:21 UTC

13 points

10 comments1 min readLW link

Steering Vectors Can Help LLM Judges Detect Subtle Dishonesty

Leon Eshuijs, mcbeth, Etha and Archie Chaudhury

3 Jun 2025 20:33 UTC

12 points

1 comment5 min readLW link

Two new datasets for evaluating political sycophancy in LLMs

alma.liezenga28 Sep 2024 18:29 UTC

9 points

0 comments9 min readLW link

LW Psychosis

internetexplorer23 Oct 2025 8:12 UTC

18 points

10 comments3 min readLW link

SAE features for refusal and sycophancy steering vectors

neverix, Dmitrii Kharlapenko, Arthur Conmy and Neel Nanda

12 Oct 2024 14:54 UTC

29 points

4 comments7 min readLW link

The Architecture of Fear: Empirical Probing of RLHF, Sycophancy, and LLM “Survival Instincts”

Tomasz Machnik25 Feb 2026 8:16 UTC

1 point

0 comments12 min readLW link

Fighting Frictionless Intelligence: The “Probe” Button as Intellectual Exercise

AnnS8 Feb 2026 12:58 UTC

1 point

0 comments3 min readLW link

No comments.