Wireheading

TagLast edit: Mar 19, 2023, 9:32 PM by Diabloto96

Wireheading is the artificial stimulation of the brain to experience pleasure, usually through the direct stimulation of an individual’s brain’s reward or pleasure center with electrical current. It can also be used in a more expanded sense, to refer to any kind of method that produces a form of counterfeit utility by directly maximizing a good feeling, but that fails to realize what we value.

Related pages: Complexity of Value, Goodhart’s Law, Inner Alignment

In both thought experiments and laboratory experiments direct stimulation of the brain’s reward center makes the individual feel happy. In theory, wireheading with a powerful enough current would be the most pleasurable experience imaginable. There is some evidence that reward is distinct from pleasure, and that most currently hypothesized forms of wireheading just motivate a person to continue the wirehead experience, not to feel happy. However, there seems to be no reason to believe that a different form of wireheading which does create subjective pleasure could not be found. The possibility of wireheading raises difficult ethical questions for those who believe that morality is based on human happiness. A civilization of wireheads “blissing out” all day while being fed and maintained by robots would be a state of maximum happiness, but such a civilization would have no art, love, scientific discovery, or any of the other things humans find valuable.

If we take wireheading as a more general form of producing counterfeit utility, there are many examples of ways of directly stimulating of the reward and pleasure centers of the brain, without actually engaging in valuable experiences. Cocaine, heroin, cigarettes and gambling are all examples of current methods of directly achieving pleasure or reward, but can be seen by many as lacking much of what we value and are potentially extremely detrimental. Steve Omohundro argues1 that: “An important class of vulnerabilities arises when the subsystems for measuring utility become corrupted. Human pleasure may be thought of as the experiential correlate of an assessment of high utility. But pleasure is mediated by neurochemicals and these are subject to manipulation.”

Wireheading is also an illustration of the complexities of creating a Friendly AI. Any AGI naively programmed to increase human happiness could devote its energies to wireheading people, possibly without their consent, in preference to any other goals. Equivalent problems arise for any simple attempt to create AGIs who care directly about human feelings (“love”, “compassion”, “excitement”, etc). An AGI could wirehead people to feel in love all the time, but this wouldn’t correctly realize what we value when we say love is a virtue. For Omohundro, because exploiting those vulnerabilities in our subsystems for measuring utility is much easier than truly realizing our values, a wrongly designed AGI would most certainly prefer to wirehead humanity instead of pursuing human values. In addition, an AGI itself could be vulnerable to wirehead and would need to implement “police forces” or “immune systems” to ensure its measuring system doesn’t become corrupted by trying to produce counterfeit utility.

External links

Wirehead Hedonism versus paradise engineering by David Pearce

Are wireheads happy?

Scott AlexanderJan 1, 2010, 4:41 PM

185 points

109 comments5 min readLW link

Some implications of radical empathy

MichaelStJulesJan 7, 2025, 4:10 PM

3 points

0 comments1 min readLW link

Really radical empathy

MichaelStJulesJan 6, 2025, 5:46 PM

19 points

0 comments1 min readLW link

Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato and Ramana Kumar

Oct 28, 2020, 4:01 PM

47 points

2 comments1 min readLW link

Reward is not the optimization target

TurnTroutJul 25, 2022, 12:03 AM

377 points

123 comments10 min readLW link 3 reviews

Utilitarianism and the replaceability of desires and attachments

MichaelStJulesJul 27, 2024, 1:57 AM

5 points

2 comments1 min readLW link

Clarifying wireheading terminology

leogaoNov 24, 2022, 4:53 AM

66 points

6 comments1 min readLW link

[Intro to brain-like-AGI safety] 10. The alignment problem

Steven ByrnesMar 30, 2022, 1:24 PM

52 points

7 comments21 min readLW link

Assessment of AI safety agendas: think about the downside risk

Roman LeventovDec 19, 2023, 9:00 AM

13 points

1 comment1 min readLW link

Self-dialogue: Do behaviorist rewards make scheming AGIs?

Steven ByrnesFeb 13, 2025, 6:39 PM

43 points

0 comments46 min readLW link

Four usages of “loss” in AI

TurnTroutOct 2, 2022, 12:52 AM

46 points

18 comments4 min readLW link

Would Your Real Preferences Please Stand Up?

Scott AlexanderAug 8, 2009, 10:57 PM

93 points

132 comments4 min readLW link

A Much Better Life?

PsychohistorianFeb 3, 2010, 8:01 PM

87 points

174 comments2 min readLW link

Wirehead your Chickens

ShmiJun 20, 2018, 5:49 AM

86 points

53 comments2 min readLW link

Wireheading and discontinuity

Michele CampoloFeb 18, 2020, 10:49 AM

21 points

4 comments3 min readLW link

Value extrapolation vs Wireheading

Stuart_ArmstrongJun 17, 2022, 3:02 PM

16 points

1 comment1 min readLW link

[Intro to brain-like-AGI safety] 9. Takeaways from neuro 2/2: On AGI motivation

Steven ByrnesMar 23, 2022, 12:48 PM

46 points

11 comments22 min readLW link

You cannot be mistaken about (not) wanting to wirehead

Kaj_SotalaJan 26, 2010, 12:06 PM

50 points

79 comments3 min readLW link

Wireheading is in the eye of the beholder

Stuart_ArmstrongJan 30, 2019, 6:23 PM

26 points

10 comments1 min readLW link

Wireheading as a potential problem with the new impact measure

Stuart_ArmstrongSep 25, 2018, 2:15 PM

25 points

20 comments4 min readLW link

A definition of wireheading

AnjaNov 27, 2012, 7:31 PM

52 points

80 comments5 min readLW link

The Stamp Collector

So8resMay 1, 2015, 11:11 PM

65 points

14 comments6 min readLW link

What is “wireheading”?

Vishakha and Algon

Dec 17, 2024, 7:49 AM

10 points

0 comments1 min readLW link

(aisafety.info)

Model-based RL, Desires, Brains, Wireheading

Steven ByrnesJul 14, 2021, 3:11 PM

22 points

1 comment13 min readLW link

Principled Satisficing To Avoid Goodhart

JenniferRMAug 16, 2024, 7:05 PM

45 points

2 comments8 min readLW link

Note on algorithms with multiple trained components

Steven ByrnesDec 20, 2022, 5:08 PM

23 points

4 comments2 min readLW link

Wireheading Done Right: Stay Positive Without Going Insane

9eB1Nov 22, 2016, 3:16 AM

10 points

2 comments1 min readLW link

(qualiacomputing.com)

Reward Hacking from a Causal Perspective

tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott and RyanCarey

Jul 21, 2023, 6:27 PM

29 points

6 comments7 min readLW link

Lotuses and Loot Boxes

DavidmanheimMay 17, 2018, 12:21 AM

14 points

2 comments4 min readLW link

Welcome to Heaven

denisbiderJan 25, 2010, 11:22 PM

27 points

246 comments2 min readLW link

Rigging is a form of wireheading

Stuart_ArmstrongMay 3, 2018, 12:50 PM

11 points

2 comments1 min readLW link

Reinforcement Learner Wireheading

Nate ShowellJul 8, 2022, 5:32 AM

8 points

2 comments3 min readLW link

Safely controlling the AGI agent reward function

Koen.HoltmanFeb 17, 2021, 2:47 PM

8 points

0 comments5 min readLW link

Disentangling Corrigibility: 2015-2021

Koen.HoltmanFeb 16, 2021, 6:01 PM

22 points

20 comments9 min readLW link

GTFO of the Social Internet Before you Can’t: The Miro & Yindi Story

keltanMay 4, 2025, 1:08 AM

29 points

10 comments10 min readLW link

Artificial intelligence wireheading

Big TonyAug 12, 2022, 3:06 AM

5 points

2 comments1 min readLW link

Hedonic asymmetries

paulfchristianoJan 26, 2020, 2:10 AM

98 points

22 comments2 min readLW link

(sideways-view.com)

Recursion in AI is scary. But let’s talk solutions.

Oleg TrottJul 16, 2024, 8:34 PM

3 points

10 comments2 min readLW link

Towards deconfusing wireheading and reward maximization

leogaoSep 21, 2022, 12:36 AM

81 points

7 comments4 min readLW link

No comments.

Wireheading

See also

External links