Honesty

TagLast edit: 3 Mar 2021 16:47 UTC by Yoav Ravid

Honesty means telling the truth and not being deceptive.

External Links:
Against Lie Inflation by Scott Alexander

Related Pages: Meta-Honesty, Deception.

Notes on Honesty

David Gross28 Oct 2020 0:54 UTC

46 points

7 comments20 min readLW link

Deep Honesty

Aletheophile7 May 2024 20:31 UTC

166 points

26 comments9 min readLW link

Meta-Honesty: Firming Up Honesty Around Its Edge-Cases

Eliezer Yudkowsky29 May 2018 0:59 UTC

150 points

157 comments27 min readLW link 4 reviews

Speaking Truth to Power Is a Schelling Point

Zack_M_Davis30 Dec 2019 6:12 UTC

52 points

19 comments2 min readLW link

Honesty: Beyond Internal Truth

Eliezer Yudkowsky6 Jun 2009 2:59 UTC

73 points

87 comments4 min readLW link

Assume Bad Faith

Zack_M_Davis25 Aug 2023 17:36 UTC

163 points

69 comments7 min readLW link 3 reviews

The Forces of Blandness and the Disagreeable Majority

sarahconstantin28 Apr 2019 19:44 UTC

135 points

27 comments3 min readLW link 2 reviews

(srconstantin.wordpress.com)

Truthful LMs as a warm-up for aligned AGI

Jacob_Hilton17 Jan 2022 16:49 UTC

65 points

14 comments13 min readLW link

“PR” is corrosive; “reputation” is not.

AnnaSalamon14 Feb 2021 3:32 UTC

348 points

95 comments2 min readLW link 3 reviews

On Bounded Distrust

Zvi3 Feb 2022 14:50 UTC

138 points

20 comments56 min readLW link 1 review

(thezvi.wordpress.com)

[Question] How “honest” is GPT-3?

abramdemski8 Jul 2020 19:38 UTC

72 points

18 comments5 min readLW link

Honorable AI

Kaarel24 Dec 2025 21:20 UTC

42 points

23 comments41 min readLW link

Lying is Cowardice, not Strategy

Connor Leahy and Gabriel Alfour

24 Oct 2023 13:24 UTC

19 points

74 comments5 min readLW link

(cognition.cafe)

Reflections on 4 years of meta-honesty

GradientDissenter2 Nov 2025 5:29 UTC

47 points

7 comments6 min readLW link

Argue Politics* With Your Best Friends

sarahconstantin15 Dec 2018 19:00 UTC

76 points

6 comments6 min readLW link

(srconstantin.wordpress.com)

Degrees of Radical Honesty

MBlume31 Mar 2009 20:36 UTC

34 points

51 comments3 min readLW link

Communication Requires Common Interests or Differential Signal Costs

Zack_M_Davis26 Mar 2021 6:41 UTC

41 points

13 comments3 min readLW link 1 review

Alphabetical Conundra Vol 2.

Screwtape30 Nov 2025 7:58 UTC

17 points

1 comment7 min readLW link

How to Corner Liars: A Miasma-Clearing Protocol

ymeskhout27 Feb 2025 17:18 UTC

69 points

24 comments7 min readLW link

(www.ymeskhout.com)

Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think

Zack_M_Davis27 Dec 2019 5:09 UTC

135 points

43 comments8 min readLW link 2 reviews

Integrity and accountability are core parts of rationality

habryka15 Jul 2019 20:22 UTC

179 points

69 comments6 min readLW link 1 review

Maybe Lying Doesn’t Exist

Zack_M_Davis14 Oct 2019 7:04 UTC

72 points

59 comments8 min readLW link

Optimized Propaganda with Bayesian Networks: Comment on “Articulating Lay Theories Through Graphical Models”

Zack_M_Davis29 Jun 2020 2:45 UTC

106 points

10 comments4 min readLW link

ML Safety Newsletter #20: AI Wellbeing, Classifier Jailbreaking and Honest Pushback Benchmarking

Alice Blair and Dan H

28 Apr 2026 19:16 UTC

16 points

0 comments5 min readLW link

The Good Try Rule

DirectedEvolution27 Dec 2020 2:38 UTC

56 points

4 comments4 min readLW link

Notes on Sincerity and such

David Gross1 Dec 2020 5:09 UTC

9 points

2 comments10 min readLW link

“Status” can be corrosive; here’s how I handle it

Orpheus1624 Jan 2023 1:25 UTC

72 points

8 comments6 min readLW link

Honest Friends Don’t Tell Comforting Lies

Serpent-Stare19 Apr 2018 16:34 UTC

21 points

11 comments5 min readLW link

How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?

Owain_Evans26 Feb 2022 12:46 UTC

44 points

3 comments11 min readLW link

Radical Honesty

Eliezer Yudkowsky10 Sep 2007 6:09 UTC

46 points

37 comments2 min readLW link

Paper: Teaching GPT3 to express uncertainty in words

Owain_Evans31 May 2022 13:27 UTC

97 points

7 comments4 min readLW link

LLM Style Slop is Absolutely Everywhere

silentbob28 Apr 2026 12:34 UTC

37 points

10 comments13 min readLW link

Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments

Jeffrey Ladish11 Jul 2022 19:38 UTC

98 points

27 comments6 min readLW link 1 review

“Desperate Honesty” by Agnes Callard

David Gross1 Aug 2023 13:34 UTC

11 points

0 comments2 min readLW link

(dailynous.com)

Maybe Lying Can’t Exist?!

Zack_M_Davis23 Aug 2020 0:36 UTC

61 points

16 comments5 min readLW link

Control Vectors as Dispositional Traits

Gianluca Calcagni23 Jun 2024 21:34 UTC

11 points

0 comments12 min readLW link

Layers Of Mind

PeteG4 Oct 2022 16:52 UTC

−8 points

4 comments2 min readLW link

Toxic Truth

MichaelHoward11 Apr 2009 11:25 UTC

16 points

31 comments1 min readLW link

Ground-Truth Label Imbalance Impairs the Performance of Contrast-Consistent Search (and Other Contrast-Pair-Based Unsupervised Methods)

Tom Angsten and Ami Hays

5 Aug 2023 17:55 UTC

6 points

2 comments7 min readLW link

(drive.google.com)

A Cup of Blue Tea

Rudaiba20 Oct 2025 11:22 UTC

−2 points

0 comments4 min readLW link

The Importance of Saying “Oops”

Eliezer Yudkowsky5 Aug 2007 3:17 UTC

313 points

40 comments2 min readLW link

Stress, panic, and HP damage

wanderingpostrat16 Feb 2026 23:32 UTC

1 point

0 comments2 min readLW link

Contrast Pairs Drive the Empirical Performance of Contrast Consistent Search (CCS)

Scott Emmons31 May 2023 17:09 UTC

97 points

1 comment6 min readLW link 1 review

White Lies

ChrisHallquist8 Feb 2014 1:20 UTC

60 points

903 comments5 min readLW link

Speaking up publicly is heroic

jefftk2 Nov 2019 12:00 UTC

44 points

2 comments1 min readLW link

(www.jefftk.com)

[Question] How to build common knowledge of rationality and honesty?

MikkW21 Feb 2021 6:07 UTC

6 points

3 comments1 min readLW link

You don’t need Kant

1 Apr 2009 18:09 UTC

2 points

59 comments5 min readLW link

Honesty, Openness, Trustworthiness, and Secrets

NormanPerlmutter6 Mar 2023 9:03 UTC

13 points

0 comments9 min readLW link

Verify, but Trust

berns17 Apr 2026 3:25 UTC

9 points

2 comments14 min readLW link

Truthful AI: Developing and governing AI that does not lie

Owain_Evans, owencb and Lukas Finnveden

18 Oct 2021 18:37 UTC

82 points

9 comments10 min readLW link

Smarter Models Lie Less

Expertium20 Jun 2025 13:31 UTC

6 points

0 comments2 min readLW link

How to parent more predictably

jefftk10 Jul 2018 15:18 UTC

81 points

1 comment4 min readLW link

How to find cool things in a new place

Sam F. Brown24 Jan 2023 11:20 UTC

12 points

0 comments1 min readLW link

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

Felix Hofstätter, Francis Rhys Ward, HarrietW, LAThomson, Ollie J, Patrik Bartak and Sam F. Brown

8 Nov 2023 11:37 UTC

49 points

0 comments18 min readLW link

Lies and Secrets

steven04618 Mar 2009 14:43 UTC

19 points

21 comments2 min readLW link

Individual Deniability, Statistical Honesty

Alicorn9 Aug 2011 4:17 UTC

62 points

8 comments1 min readLW link

On Intentionality, or: Towards a More Inclusive Concept of Lying

Cornelius Dybdahl18 Oct 2024 10:37 UTC

8 points

0 comments4 min readLW link

The Jordan Peterson Mask

Jacob Falkovich3 Mar 2018 19:49 UTC

61 points

154 comments12 min readLW link

Truth is Universal: Robust Detection of Lies in LLMs

Lennart Buerger19 Jul 2024 14:07 UTC

24 points

4 comments2 min readLW link

(arxiv.org)

Virtues related to honesty

Orioth30 May 2025 14:11 UTC

11 points

23 comments2 min readLW link

How “Discovering Latent Knowledge in Language Models Without Supervision” Fits Into a Broader Alignment Scheme

Collin15 Dec 2022 18:22 UTC

244 points

41 comments16 min readLW link 1 review

Ethics Notes

Eliezer Yudkowsky21 Oct 2008 21:57 UTC

20 points

46 comments11 min readLW link

Protected From Myself

Eliezer Yudkowsky19 Oct 2008 0:09 UTC

49 points

30 comments6 min readLW link

[RFC] Possible ways to expand on “Discovering Latent Knowledge in Language Models Without Supervision”.

Georgios Kaklamanos, Walter Laurito , Kaarel and Kay Kozaronek

25 Jan 2023 19:03 UTC

48 points

6 comments12 min readLW link

Answer in your head

throwaway8355439 Feb 2026 7:41 UTC

16 points

2 comments3 min readLW link

Declare your signaling and hidden agendas

Kaj_Sotala13 Apr 2009 12:01 UTC

25 points

21 comments3 min readLW link

Hufflepuff Cynicism

abramdemski13 Feb 2018 2:15 UTC

29 points

18 comments6 min readLW link

Civility Is Never Neutral

ozymandias22 Nov 2017 16:54 UTC

62 points

15 comments4 min readLW link

Discovering Latent Knowledge in the Human Brain: Part 1 – Clarifying the concepts of belief and knowledge

Joseph Emerson15 Oct 2023 9:02 UTC

5 points

0 comments12 min readLW link

parenting rules

Dave Orr21 Dec 2020 19:48 UTC

160 points

9 comments5 min readLW link

Five Reasons to Lie

Dzoldzaya17 Jan 2023 16:53 UTC

0 points

19 comments3 min readLW link

[Question] [retracted] Discussion: Was SBF a naive utilitarian, or a sociopath?

Nicholas Kross17 Nov 2022 2:52 UTC

0 points

4 comments1 min readLW link

Neo-Mohism

Bae's Theorem16 Jun 2021 21:57 UTC

5 points

11 comments7 min readLW link

No comments.