With respect, no, we do not wirehead in any mathematical sense.
What do you mean by wireheading in “mathematical sense”?
Human beings recognize that wireheading (ex: heroin addiction) is a corruption of our valuation processes. Some people do it anyway, because they’re irrational, or because their decision-making machinery is too corrupted to stop on their own, but by and large human addicts don’t want to be addicts and don’t want to want to be addicts.
Humans clearly aren’t simple reinforcement learning agents. They tend have at least a distinction between short-term pleasure and long-term goals, and drugs mostly affect the former but not the latter. Most drug addicts are ego-dystonic w.r.t. drugs: they don’t want to be addicts. This means that addiction hasn’t completely changed their value system as true wireheading would.
However, to the extent that humans can be modelled as reinforcement learning agents, they tend to display reward channel manipulation behaviors.
The basic issue, I would have to say, is that human beings have multiple valuation systems. If I had to guess, “we” evolved reinforcement learning behaviors back during the “first animals with a brain” stage, and developed more complex and harder-to-fake valuation systems on top of that further down the line.
What do you mean by wireheading in “mathematical sense”?
What do you mean by wireheading in “mathematical sense”?
Humans clearly aren’t simple reinforcement learning agents. They tend have at least a distinction between short-term pleasure and long-term goals, and drugs mostly affect the former but not the latter.
Most drug addicts are ego-dystonic w.r.t. drugs: they don’t want to be addicts. This means that addiction hasn’t completely changed their value system as true wireheading would.
However, to the extent that humans can be modelled as reinforcement learning agents, they tend to display reward channel manipulation behaviors.
The basic issue, I would have to say, is that human beings have multiple valuation systems. If I had to guess, “we” evolved reinforcement learning behaviors back during the “first animals with a brain” stage, and developed more complex and harder-to-fake valuation systems on top of that further down the line.
In the sense of preference solipsism.