RSS

Wireheading

TagLast edit: 19 Mar 2023 21:32 UTC by Diabloto96

Wireheading is the artificial stimulation of the brain to experience pleasure, usually through the direct stimulation of an individual’s brain’s reward or pleasure center with electrical current. It can also be used in a more expanded sense, to refer to any kind of method that produces a form of counterfeit utility by directly maximizing a good feeling, but that fails to realize what we value.

Related pages: Complexity of Value, Goodhart’s Law, Inner Alignment

In both thought experiments and laboratory experiments direct stimulation of the brain’s reward center makes the individual feel happy. In theory, wireheading with a powerful enough current would be the most pleasurable experience imaginable. There is some evidence that reward is distinct from pleasure, and that most currently hypothesized forms of wireheading just motivate a person to continue the wirehead experience, not to feel happy. However, there seems to be no reason to believe that a different form of wireheading which does create subjective pleasure could not be found. The possibility of wireheading raises difficult ethical questions for those who believe that morality is based on human happiness. A civilization of wireheads “blissing out” all day while being fed and maintained by robots would be a state of maximum happiness, but such a civilization would have no art, love, scientific discovery, or any of the other things humans find valuable.

If we take wireheading as a more general form of producing counterfeit utility, there are many examples of ways of directly stimulating of the reward and pleasure centers of the brain, without actually engaging in valuable experiences. Cocaine, heroin, cigarettes and gambling are all examples of current methods of directly achieving pleasure or reward, but can be seen by many as lacking much of what we value and are potentially extremely detrimental. Steve Omohundro argues1 that: “An important class of vulnerabilities arises when the subsystems for measuring utility become corrupted. Human pleasure may be thought of as the experiential correlate of an assessment of high utility. But pleasure is mediated by neurochemicals and these are subject to manipulation.”

Wireheading is also an illustration of the complexities of creating a Friendly AI. Any AGI naively programmed to increase human happiness could devote its energies to wireheading people, possibly without their consent, in preference to any other goals. Equivalent problems arise for any simple attempt to create AGIs who care directly about human feelings (“love”, “compassion”, “excitement”, etc). An AGI could wirehead people to feel in love all the time, but this wouldn’t correctly realize what we value when we say love is a virtue. For Omohundro, because exploiting those vulnerabilities in our subsystems for measuring utility is much easier than truly realizing our values, a wrongly designed AGI would most certainly prefer to wirehead humanity instead of pursuing human values. In addition, an AGI itself could be vulnerable to wirehead and would need to implement “police forces” or “immune systems” to ensure its measuring system doesn’t become corrupted by trying to produce counterfeit utility.

See also

External links

Are wire­heads happy?

Scott Alexander1 Jan 2010 16:41 UTC
180 points
109 comments5 min readLW link

Draft pa­pers for REALab and De­cou­pled Ap­proval on tampering

28 Oct 2020 16:01 UTC
47 points
2 comments1 min readLW link

Re­ward is not the op­ti­miza­tion target

TurnTrout25 Jul 2022 0:03 UTC
348 points
123 comments10 min readLW link3 reviews

Wire­head­ing is in the eye of the beholder

Stuart_Armstrong30 Jan 2019 18:23 UTC
26 points
10 comments1 min readLW link

Wire­head­ing as a po­ten­tial prob­lem with the new im­pact measure

Stuart_Armstrong25 Sep 2018 14:15 UTC
25 points
20 comments4 min readLW link

Wire­head­ing and discontinuity

Michele Campolo18 Feb 2020 10:49 UTC
21 points
4 comments3 min readLW link

You can­not be mis­taken about (not) want­ing to wirehead

Kaj_Sotala26 Jan 2010 12:06 UTC
46 points
79 comments3 min readLW link

A defi­ni­tion of wireheading

Anja27 Nov 2012 19:31 UTC
52 points
80 comments5 min readLW link

Note on al­gorithms with mul­ti­ple trained components

Steven Byrnes20 Dec 2022 17:08 UTC
23 points
4 comments2 min readLW link

Assess­ment of AI safety agen­das: think about the down­side risk

Roman Leventov19 Dec 2023 9:00 UTC
13 points
1 comment1 min readLW link

Wire­head your Chickens

shminux20 Jun 2018 5:49 UTC
86 points
53 comments2 min readLW link

Defin­ing AI wireheading

Stuart_Armstrong21 Nov 2019 13:04 UTC
25 points
9 comments4 min readLW link

Would Your Real Prefer­ences Please Stand Up?

Scott Alexander8 Aug 2009 22:57 UTC
90 points
132 comments4 min readLW link

Model-based RL, De­sires, Brains, Wireheading

Steven Byrnes14 Jul 2021 15:11 UTC
22 points
1 comment13 min readLW link

[In­tro to brain-like-AGI safety] 9. Take­aways from neuro 2/​2: On AGI motivation

Steven Byrnes23 Mar 2022 12:48 UTC
44 points
11 comments21 min readLW link

[In­tro to brain-like-AGI safety] 10. The al­ign­ment problem

Steven Byrnes30 Mar 2022 13:24 UTC
48 points
6 comments19 min readLW link

Value ex­trap­o­la­tion vs Wireheading

Stuart_Armstrong17 Jun 2022 15:02 UTC
16 points
1 comment1 min readLW link

Four us­ages of “loss” in AI

TurnTrout2 Oct 2022 0:52 UTC
43 points
18 comments4 min readLW link

gen­er­al­ized wireheading

Tamsin Leake18 Nov 2022 20:18 UTC
25 points
7 comments2 min readLW link
(carado.moe)

A Much Bet­ter Life?

Psychohistorian3 Feb 2010 20:01 UTC
88 points
174 comments2 min readLW link

Wire­head­ing Done Right: Stay Pos­i­tive Without Go­ing Insane

9eB122 Nov 2016 3:16 UTC
10 points
2 comments1 min readLW link
(qualiacomputing.com)

Towards de­con­fus­ing wire­head­ing and re­ward maximization

leogao21 Sep 2022 0:36 UTC
81 points
7 comments4 min readLW link

Re­in­force­ment Learner Wireheading

Nate Showell8 Jul 2022 5:32 UTC
8 points
2 comments3 min readLW link

Lo­tuses and Loot Boxes

Davidmanheim17 May 2018 0:21 UTC
14 points
2 comments4 min readLW link

The Stamp Collector

So8res1 May 2015 23:11 UTC
56 points
14 comments6 min readLW link

Rig­ging is a form of wireheading

Stuart_Armstrong3 May 2018 12:50 UTC
11 points
2 comments1 min readLW link

Re­ward Hack­ing from a Causal Perspective

21 Jul 2023 18:27 UTC
29 points
4 comments7 min readLW link

Wel­come to Heaven

denisbider25 Jan 2010 23:22 UTC
26 points
246 comments2 min readLW link

Ar­tifi­cial in­tel­li­gence wireheading

Big Tony12 Aug 2022 3:06 UTC
5 points
2 comments1 min readLW link

He­donic asymmetries

paulfchristiano26 Jan 2020 2:10 UTC
98 points
22 comments2 min readLW link
(sideways-view.com)

Disen­tan­gling Cor­rigi­bil­ity: 2015-2021

Koen.Holtman16 Feb 2021 18:01 UTC
22 points
20 comments9 min readLW link

Safely con­trol­ling the AGI agent re­ward function

Koen.Holtman17 Feb 2021 14:47 UTC
8 points
0 comments5 min readLW link
No comments.