Wireheading. Again, this is a general problem of reinforcement learning agents.
Humans, even those who don’t believe in supernatural souls, do that all the time, with respect to both evolutionary fitness and socially derived rewards.
With respect, no, we do not wirehead in any mathematical sense. Human beings recognize that wireheading (ex: heroin addiction) is a corruption of our valuation processes. Some people do it anyway, because they’re irrational, or because their decision-making machinery is too corrupted to stop on their own, but by and large human addicts don’t want to be addicts and don’t want to want to be addicts.
In contrast, AIXI wants to wirehead, and doesn’t want to want anything (because it has no second-order thought processes).
If we understood what was going on cognitively in humans who don’t want to wirehead, people like Daniel Dewey, myself and Luke who are trying to think about AI goal-systems that won’t wirehead would get a lot further. Some “anti-wireheading math”, for instance, would make value learners and reinforcement learners follow their operators’ intentions more closely than they currently can be modelled as doing.
Wireheading and drug addiction are superficially similar, but there are significant and relevant differences. First, drugs can have a direct negative effects on your health in ways that don’t have anything to do with addiction—so, even if the effects of drugs are temporarily pleasurable, the net effect of drugs on total lifetime pleasure can be negative. Second, drugs can be physically addictive, so even if you decide to stop using them, you could feel physically ill and even die from withdrawal. Third, for some drugs, the body gets used to a drug dosage and needs a greater dose to achieve the desired effect, which presumably wouldn’t be the case for wireheading.
Some people do it anyway, because they’re irrational, or because their decision-making machinery is too corrupted to stop on their own,
I want to wirehead. Why do you think I’m irrational or corrupted?
For example, I can understand how someone who values nothing other than their own pleasure (or who approximates this state well enough by valuing pleasure much much more than anything else) can rationally choose to wirehead. And if I recall our earlier conversation correctly, you consider what people value to be equivalent to what they like, and don’t believe anyone likes anything other than they like their own pleasure.
Given all of that unpacking, and taking your self-description at face value, I can understand why you would choose to wirehead… indeed, I can understand why you would find it bewildering that anyone else wouldn’t, and would likely conclude that people who say stuff like that are just signalling.
But the unpacking is critical, because I don’t share your (claimed) values, so will reliably misunderstand assertions that depend implicitly on those values.
Or, at least, I reliably talk as though I didn’t share them, so I will predictably respond to such assertions as though I’d misunderstood them.
What I’m curious about is why Eli Sennesh thinks that a value system like my own is irrational or corrupted.
I can understand why you would find it bewildering that anyone else wouldn’t, and would likely conclude that people who say stuff like that are just signalling.
I don’t think they’re signaling, I think they genuinely don’t want to wirehead, similarly to how everyone genuinely doesn’t want to be tortured. I just think that the division between wanting and liking is one of the greatest flaws of human psychology, and when they differ, wanting should be beaten into agreeing with liking.
What I’m curious about is why Eli Sennesh thinks that a value system like my own is irrational or corrupted.
(nods) Well, I can’t speak for Eli, but judging from what they’ve written I conclude that you and Eli don’t even agree on what a hypothetical wireheader’s values are. You would say that a wireheader’s values are necessarily to wirehead, since they evidently like wireheading… if they didn’t, they would stop. I suspect Eli would say that a wireheader’s values aren’t necessarily to wirehead, since they might not want to wirehead.
I just think that [..] when they differ, wanting should be beaten into agreeing with liking.
Right, I understand; we’ve had this discussion before. I disagree with you entirely, as I value some of the things I want. But discussing value differences rarely gets anywhere, so I’m happy to leave that there.
I want to wirehead. Why do you think I’m irrational or corrupted?
The vast majority of people who’ve tried heavy drugs or other primitive forms of “wireheading” end up preferring they hadn’t done so. If I predict that you don’t really want to wirehead, I’m probably right.
On the other hand, if you’re willing to do the paperwork to deal with a medical-ethics board, we could of course hook you up with an electrode to your pleasure center for a set period of time, then unhook it and “sober you up” so you’d be non-dependent on it, and then if, after all that, you said to reinstall the electrode so you could wirehead some more (particularly in such a way that other people don’t answer that, indicating a decision of personal values rather than mere addiction), I would of course believe that you rationally desire to wirehead and would of course arrange the medical procedures to grant your request.
But the burden of evidence necessary to beat my “you don’t really want to wirehead” prior is high enough that I want to experiment instead of just believing you at face-value.
I think that when most people talk about wireheading, they mean something like “ideal directly stimulated pleasure”, which isn’t necessarily the same thing as current wireheading. It’s quite possible for current wireheading to be flawed to such a degree that not only is it not the greatest possible pleasure, it isn’t even as much pleasure as people can reasonably get by other means. While I would want the perfect wirehead, current wireheading is less appealing.
With respect, no, we do not wirehead in any mathematical sense.
What do you mean by wireheading in “mathematical sense”?
Human beings recognize that wireheading (ex: heroin addiction) is a corruption of our valuation processes. Some people do it anyway, because they’re irrational, or because their decision-making machinery is too corrupted to stop on their own, but by and large human addicts don’t want to be addicts and don’t want to want to be addicts.
Humans clearly aren’t simple reinforcement learning agents. They tend have at least a distinction between short-term pleasure and long-term goals, and drugs mostly affect the former but not the latter. Most drug addicts are ego-dystonic w.r.t. drugs: they don’t want to be addicts. This means that addiction hasn’t completely changed their value system as true wireheading would.
However, to the extent that humans can be modelled as reinforcement learning agents, they tend to display reward channel manipulation behaviors.
The basic issue, I would have to say, is that human beings have multiple valuation systems. If I had to guess, “we” evolved reinforcement learning behaviors back during the “first animals with a brain” stage, and developed more complex and harder-to-fake valuation systems on top of that further down the line.
What do you mean by wireheading in “mathematical sense”?
With respect, no, we do not wirehead in any mathematical sense. Human beings recognize that wireheading (ex: heroin addiction) is a corruption of our valuation processes. Some people do it anyway, because they’re irrational, or because their decision-making machinery is too corrupted to stop on their own, but by and large human addicts don’t want to be addicts and don’t want to want to be addicts.
In contrast, AIXI wants to wirehead, and doesn’t want to want anything (because it has no second-order thought processes).
If we understood what was going on cognitively in humans who don’t want to wirehead, people like Daniel Dewey, myself and Luke who are trying to think about AI goal-systems that won’t wirehead would get a lot further. Some “anti-wireheading math”, for instance, would make value learners and reinforcement learners follow their operators’ intentions more closely than they currently can be modelled as doing.
Wireheading and drug addiction are superficially similar, but there are significant and relevant differences. First, drugs can have a direct negative effects on your health in ways that don’t have anything to do with addiction—so, even if the effects of drugs are temporarily pleasurable, the net effect of drugs on total lifetime pleasure can be negative. Second, drugs can be physically addictive, so even if you decide to stop using them, you could feel physically ill and even die from withdrawal. Third, for some drugs, the body gets used to a drug dosage and needs a greater dose to achieve the desired effect, which presumably wouldn’t be the case for wireheading.
I want to wirehead. Why do you think I’m irrational or corrupted?
It can help to unpack these terms a little.
For example, I can understand how someone who values nothing other than their own pleasure (or who approximates this state well enough by valuing pleasure much much more than anything else) can rationally choose to wirehead. And if I recall our earlier conversation correctly, you consider what people value to be equivalent to what they like, and don’t believe anyone likes anything other than they like their own pleasure.
Given all of that unpacking, and taking your self-description at face value, I can understand why you would choose to wirehead… indeed, I can understand why you would find it bewildering that anyone else wouldn’t, and would likely conclude that people who say stuff like that are just signalling.
But the unpacking is critical, because I don’t share your (claimed) values, so will reliably misunderstand assertions that depend implicitly on those values.
Or, at least, I reliably talk as though I didn’t share them, so I will predictably respond to such assertions as though I’d misunderstood them.
Good analysis.
What I’m curious about is why Eli Sennesh thinks that a value system like my own is irrational or corrupted.
I don’t think they’re signaling, I think they genuinely don’t want to wirehead, similarly to how everyone genuinely doesn’t want to be tortured. I just think that the division between wanting and liking is one of the greatest flaws of human psychology, and when they differ, wanting should be beaten into agreeing with liking.
(nods) Well, I can’t speak for Eli, but judging from what they’ve written I conclude that you and Eli don’t even agree on what a hypothetical wireheader’s values are. You would say that a wireheader’s values are necessarily to wirehead, since they evidently like wireheading… if they didn’t, they would stop. I suspect Eli would say that a wireheader’s values aren’t necessarily to wirehead, since they might not want to wirehead.
Right, I understand; we’ve had this discussion before. I disagree with you entirely, as I value some of the things I want. But discussing value differences rarely gets anywhere, so I’m happy to leave that there.
The vast majority of people who’ve tried heavy drugs or other primitive forms of “wireheading” end up preferring they hadn’t done so. If I predict that you don’t really want to wirehead, I’m probably right.
On the other hand, if you’re willing to do the paperwork to deal with a medical-ethics board, we could of course hook you up with an electrode to your pleasure center for a set period of time, then unhook it and “sober you up” so you’d be non-dependent on it, and then if, after all that, you said to reinstall the electrode so you could wirehead some more (particularly in such a way that other people don’t answer that, indicating a decision of personal values rather than mere addiction), I would of course believe that you rationally desire to wirehead and would of course arrange the medical procedures to grant your request.
But the burden of evidence necessary to beat my “you don’t really want to wirehead” prior is high enough that I want to experiment instead of just believing you at face-value.
I think that when most people talk about wireheading, they mean something like “ideal directly stimulated pleasure”, which isn’t necessarily the same thing as current wireheading. It’s quite possible for current wireheading to be flawed to such a degree that not only is it not the greatest possible pleasure, it isn’t even as much pleasure as people can reasonably get by other means. While I would want the perfect wirehead, current wireheading is less appealing.
What do you mean by wireheading in “mathematical sense”?
Humans clearly aren’t simple reinforcement learning agents. They tend have at least a distinction between short-term pleasure and long-term goals, and drugs mostly affect the former but not the latter.
Most drug addicts are ego-dystonic w.r.t. drugs: they don’t want to be addicts. This means that addiction hasn’t completely changed their value system as true wireheading would.
However, to the extent that humans can be modelled as reinforcement learning agents, they tend to display reward channel manipulation behaviors.
The basic issue, I would have to say, is that human beings have multiple valuation systems. If I had to guess, “we” evolved reinforcement learning behaviors back during the “first animals with a brain” stage, and developed more complex and harder-to-fake valuation systems on top of that further down the line.
In the sense of preference solipsism.