Deception

TagLast edit: 8 Feb 2023 14:23 UTC by Roman Leventov

Deception is the act of sharing information in a way which intentionally misleads others.

Maybe Lying Can’t Exist?!

Zack_M_Davis23 Aug 2020 0:36 UTC

58 points

16 comments5 min readLW link

Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles

Zack_M_Davis2 Mar 2024 22:05 UTC

37 points

19 comments58 min readLW link

(unremediatedgender.space)

Algorithms of Deception!

Zack_M_Davis19 Oct 2019 18:04 UTC

23 points

7 comments5 min readLW link

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Simon Goldstein and Peter S. Park

29 Aug 2023 1:29 UTC

52 points

3 comments10 min readLW link

Conflict Theory of Bounded Distrust

Zack_M_Davis12 Feb 2023 5:30 UTC

106 points

29 comments3 min readLW link

Interpreting the Learning of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC

30 points

10 comments9 min readLW link

Deep Deceptiveness

So8res21 Mar 2023 2:51 UTC

235 points

58 comments14 min readLW link

On hiding the source of knowledge

jessicata26 Jan 2020 2:48 UTC

115 points

40 comments3 min readLW link

(unstableontology.com)

Deconfusing Deception

J Bostock29 Jan 2022 16:43 UTC

26 points

6 comments2 min readLW link

LCDT, A Myopic Decision Theory

adamShimi and evhub

3 Aug 2021 22:41 UTC

57 points

50 comments15 min readLW link

How likely is deceptive alignment?

evhub30 Aug 2022 19:34 UTC

103 points

28 comments60 min readLW link

SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research

Roman Leventov19 Dec 2023 16:49 UTC

17 points

5 comments3 min readLW link

If Clarity Seems Like Death to Them

Zack_M_Davis30 Dec 2023 17:40 UTC

41 points

191 comments87 min readLW link

(unremediatedgender.space)

[Question] Why do so many think deception in AI is important?

Prometheus13 Jan 2024 8:14 UTC

23 points

12 comments1 min readLW link

Difficulty classes for alignment properties

Jozdien20 Feb 2024 9:08 UTC

33 points

5 comments2 min readLW link

Lying is Cowardice, not Strategy

Connor Leahy and Gabriel Alfour

24 Oct 2023 13:24 UTC

33 points

73 comments5 min readLW link

(cognition.cafe)

Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think

Zack_M_Davis27 Dec 2019 5:09 UTC

122 points

43 comments8 min readLW link 2 reviews

Optimized Propaganda with Bayesian Networks: Comment on “Articulating Lay Theories Through Graphical Models”

Zack_M_Davis29 Jun 2020 2:45 UTC

105 points

10 comments4 min readLW link

Maybe Lying Doesn’t Exist

Zack_M_Davis14 Oct 2019 7:04 UTC

64 points

57 comments8 min readLW link

Can crimes be discussed literally?

Benquo22 Mar 2020 20:17 UTC

102 points

38 comments2 min readLW link 3 reviews

(benjaminrosshoffman.com)

Don’t Double-Crux With Suicide Rock

Zack_M_Davis1 Jan 2020 19:02 UTC

81 points

30 comments2 min readLW link

Lying Alignment Chart

Zack_M_Davis29 Nov 2023 16:15 UTC

76 points

17 comments1 min readLW link

“Rationalizing” and “Sitting Bolt Upright in Alarm.”

Raemon8 Jul 2019 20:34 UTC

40 points

56 comments4 min readLW link

Superintelligence 11: The treacherous turn

KatjaGrace25 Nov 2014 2:00 UTC

16 points

50 comments6 min readLW link

“On Bullshit” and “On Truth,” by Harry Frankfurt

Callmesalticidae28 Aug 2020 0:44 UTC

20 points

3 comments6 min readLW link

When Someone Tells You They’re Lying, Believe Them

ymeskhout14 Jul 2023 0:31 UTC

92 points

3 comments3 min readLW link

Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research

evhub, Nicholas Schiefer, Carson Denison and Ethan Perez

8 Aug 2023 1:30 UTC

308 points

26 comments18 min readLW link

Less Wrong Poetry Corner: Coventry Patmore’s “Magna Est Veritas”

Zack_M_Davis30 Jan 2021 5:16 UTC

15 points

1 comment1 min readLW link

Overconfidence is Deceit

[DEACTIVATED] Duncan Sabien17 Feb 2021 10:45 UTC

78 points

29 comments11 min readLW link

Unnatural Categories Are Optimized for Deception

Zack_M_Davis8 Jan 2021 20:54 UTC

89 points

27 comments33 min readLW link 1 review

Communication Requires Common Interests or Differential Signal Costs

Zack_M_Davis26 Mar 2021 6:41 UTC

40 points

13 comments3 min readLW link 1 review

[Book Review] “Houdini on Magic” by Harry Houdini

lsusr29 Sep 2021 2:37 UTC

21 points

1 comment6 min readLW link

Comment on “Deception as Cooperation”

Zack_M_Davis27 Nov 2021 4:04 UTC

23 points

4 comments7 min readLW link

On Bounded Distrust

Zvi3 Feb 2022 14:50 UTC

135 points

19 comments56 min readLW link 1 review

(thezvi.wordpress.com)

[Question] Everyone’s mired in the deepest confusion, some of the time?

M. Y. Zuo9 Feb 2022 2:53 UTC

1 point

2 comments1 min readLW link

The Speed + Simplicity Prior is probably anti-deceptive

Yonadav Shavit27 Apr 2022 19:30 UTC

28 points

28 comments12 min readLW link

Precursor checking for deceptive alignment

evhub3 Aug 2022 22:56 UTC

24 points

0 comments14 min readLW link

“Rationalist Discourse” Is Like “Physicist Motors”

Zack_M_Davis26 Feb 2023 5:58 UTC

133 points

152 comments9 min readLW link

Contract Fraud

jefftk1 Mar 2023 3:10 UTC

86 points

10 comments1 min readLW link

(www.jefftk.com)

Notes on Honesty

David Gross28 Oct 2020 0:54 UTC

46 points

6 comments18 min readLW link

Notes on Sincerity and such

David Gross1 Dec 2020 5:09 UTC

9 points

2 comments11 min readLW link

Plausibly, almost every powerful algorithm would be manipulative

Stuart_Armstrong6 Feb 2020 11:50 UTC

38 points

25 comments3 min readLW link

Why artificial optimism?

jessicata15 Jul 2019 21:41 UTC

67 points

29 comments4 min readLW link

(unstableontology.com)

Entangled Truths, Contagious Lies

Eliezer Yudkowsky15 Oct 2008 23:39 UTC

99 points

42 comments4 min readLW link

Knowing I’m Being Tricked is Barely Enough

Elizabeth26 Feb 2019 17:50 UTC

37 points

10 comments2 min readLW link

(acesounderglass.com)

Not Technically Lying

Psychohistorian4 Jul 2009 18:40 UTC

50 points

85 comments4 min readLW link

Sex, Lies, and Dexamethasone

Jacob Falkovich20 Feb 2018 19:56 UTC

15 points

1 comment9 min readLW link

Attention! Financial scam targeting Less Wrong users

Viliam_Bur14 May 2016 17:38 UTC

38 points

92 comments2 min readLW link

If we can’t lie to others, we will lie to ourselves

paulfchristiano26 Nov 2016 22:29 UTC

45 points

24 comments1 min readLW link

(sideways-view.com)

Latent Adversarial Training

Adam Jermyn29 Jun 2022 20:04 UTC

42 points

12 comments5 min readLW link

A way to make solving alignment 10.000 times easier. The shorter case for a massive open source simbox project.

AlexFromSafeTransition21 Jun 2023 8:08 UTC

2 points

16 comments14 min readLW link

Universality Unwrapped

adamShimi21 Aug 2020 18:53 UTC

29 points

2 comments18 min readLW link

Modelling Deception

Garrett Baker18 Jul 2022 21:21 UTC

15 points

0 comments7 min readLW link

Functional silence: communication that minimizes change of receiver’s beliefs

chaosmage12 Feb 2019 21:32 UTC

27 points

5 comments2 min readLW link

Of Lies and Black Swan Blowups

Eliezer Yudkowsky7 Apr 2009 18:26 UTC

28 points

8 comments1 min readLW link

Deception as the optimal: mesa-optimizers and inner alignment

Eleni Angelou16 Aug 2022 4:49 UTC

11 points

0 comments5 min readLW link

[Linkpost] Deception Abilities Emerged in Large Language Models

Bogdan Ionut Cirstea3 Aug 2023 17:28 UTC

12 points

0 comments1 min readLW link

Three scenarios of pseudo-alignment

Eleni Angelou3 Sep 2022 12:47 UTC

9 points

0 comments3 min readLW link

Corrigibility thoughts III: manipulating versus deceiving

Stuart_Armstrong18 Jan 2017 15:57 UTC

3 points

0 comments1 min readLW link

Blatant lies are the best kind!

Benquo3 Jul 2019 20:45 UTC

28 points

17 comments5 min readLW link

(benjaminrosshoffman.com)

[LINK] EA Has A Lying Problem

Benquo11 Jan 2017 22:31 UTC

28 points

34 comments1 min readLW link

(srconstantin.wordpress.com)

Matching donation fundraisers can be harmfully dishonest.

Benquo11 Nov 2016 21:05 UTC

18 points

6 comments14 min readLW link

Misleading the witness

Bo1020109 Aug 2009 20:13 UTC

16 points

116 comments2 min readLW link

The Santa deception: how did it affect you?

Desrtopa20 Dec 2010 22:27 UTC

30 points

204 comments1 min readLW link

LLMs can strategically deceive while doing gain-of-function research

Igor Ivanov24 Jan 2024 15:45 UTC

33 points

4 comments11 min readLW link

How to solve deception and still fail.

Charlie Steiner4 Oct 2023 19:56 UTC

40 points

7 comments6 min readLW link

Thoughts On (Solving) Deep Deception

Jozdien21 Oct 2023 22:40 UTC

66 points

2 comments6 min readLW link

Deception Chess

Chris Land1 Jan 2024 15:40 UTC

7 points

2 comments4 min readLW link

Monitoring for deceptive alignment

evhub8 Sep 2022 23:07 UTC

135 points

8 comments9 min readLW link

Getting up to Speed on the Speed Prior in 2022

robertzk28 Dec 2022 7:49 UTC

36 points

5 comments65 min readLW link

The commercial incentive to intentionally train AI to deceive us

Derek M. Jones29 Dec 2022 11:30 UTC

5 points

1 comment4 min readLW link

(shape-of-code.com)

My Clients, The Liars

ymeskhout5 Mar 2024 21:06 UTC

238 points

85 comments7 min readLW link

Deceptive failures short of full catastrophe.

Alex Lawsen 15 Jan 2023 19:28 UTC

33 points

5 comments9 min readLW link

(Partial) failure in replicating deceptive alignment experiment

claudia.biancotti7 Jan 2024 17:56 UTC

1 point

0 comments1 min readLW link

Discussion: Challenges with Unsupervised LLM Knowledge Discovery

Seb Farquhar, Vikrant Varma, zac_kenton, gasteigerjo, Vlad Mikulik and Rohin Shah

18 Dec 2023 11:58 UTC

147 points

21 comments10 min readLW link

EIS VIII: An Engineer’s Understanding of Deceptive Alignment

scasper19 Feb 2023 15:25 UTC

30 points

5 comments4 min readLW link

Sparse Features Through Time

Rogan Inglis24 Jun 2024 18:06 UTC

12 points

1 comment1 min readLW link

(roganinglis.io)

AI x-risk, approximately ordered by embarrassment

Alex Lawsen 12 Apr 2023 23:01 UTC

149 points

7 comments19 min readLW link

Research Report: Incorrectness Cascades

Robert_AIZI14 Apr 2023 12:49 UTC

19 points

0 comments10 min readLW link

(aizi.substack.com)

Deception Strategies

Thoth Hermes20 Apr 2023 15:59 UTC

−7 points

2 comments5 min readLW link

(thothhermes.substack.com)

I was Wrong, Simulator Theory is Real

Robert_AIZI26 Apr 2023 17:45 UTC

75 points

7 comments3 min readLW link

(aizi.substack.com)

LM Situational Awareness, Evaluation Proposal: Violating Imitation

Jacob Pfau26 Apr 2023 22:53 UTC

16 points

2 comments2 min readLW link

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

Felix Hofstätter, Francis Rhys Ward, HarrietW, LAThomson, Ollie J, Patrik Bartak and Sam F. Brown

8 Nov 2023 11:37 UTC

49 points

0 comments18 min readLW link

Large Language Models can Strategically Deceive their Users when Put Under Pressure.

ReaderM15 Nov 2023 16:36 UTC

89 points

8 comments2 min readLW link

(arxiv.org)

Inducing Unprompted Misalignment in LLMs

Sam Svenningsen, evhub and Henry Sleight

19 Apr 2024 20:00 UTC

38 points

6 comments16 min readLW link

Why I’m Worried About AI

peterbarnett23 May 2022 21:13 UTC

22 points

2 comments12 min readLW link

Training Trace Priors

Adam Jermyn13 Jun 2022 14:22 UTC

12 points

17 comments4 min readLW link

Multigate Priors

Adam Jermyn15 Jun 2022 19:30 UTC

4 points

0 comments3 min readLW link

Conditioning Generative Models

Adam Jermyn25 Jun 2022 22:15 UTC

24 points

18 comments10 min readLW link

Formalizing Deception

JamesH26 Jun 2022 17:39 UTC

14 points

2 comments5 min readLW link

Training Trace Priors and Speed Priors

Adam Jermyn26 Jun 2022 18:07 UTC

17 points

0 comments3 min readLW link

‘Empiricism!’ as Anti-Epistemology

Eliezer Yudkowsky14 Mar 2024 2:02 UTC

167 points

85 comments25 min readLW link

An Increasingly Manipulative Newsfeed

Michaël Trazzi1 Jul 2019 15:26 UTC

62 points

16 comments5 min readLW link

Cheerios: An “Untested New Drug”

MBlume15 May 2009 2:26 UTC

9 points

14 comments1 min readLW link

How theism works

Paul Crowley10 Apr 2009 16:16 UTC

59 points

39 comments1 min readLW link

Toy model of the AI control problem: animated version

Stuart_Armstrong10 Oct 2017 11:06 UTC

23 points

8 comments1 min readLW link

Dishonest Update Reporting

Zvi4 May 2019 14:10 UTC

61 points

27 comments6 min readLW link 2 reviews

(thezvi.wordpress.com)

White Lies

ChrisHallquist8 Feb 2014 1:20 UTC

60 points

902 comments5 min readLW link

Are minimal circuits deceptive?

evhub7 Sep 2019 18:11 UTC

78 points

11 comments8 min readLW link

Will transparency help catch deception? Perhaps not

Matthew Barnett4 Nov 2019 20:52 UTC

43 points

5 comments7 min readLW link

No comments.