The only counterarguments I can think of would be:
The claim that the likelihood of s-risks being close to that of x-risks seems not well argued to me. In particular, conflict seems to be the most plausible scenario (and one which has a high prior placed on it as we can observe that much suffering today is caused by conflict), but it seems to be less and less likely of a scenario once you factor in superintelligence, as multi-polar scenarios seem to be either very short-lived or unlikely to happen at all.
We should be wary of applying anthropomorphic traits to hypothetical artificial agents in the future. Pain in biological organisms may very well have evolved as a proxy to negative utility, and might not be necessary in “pure” agent intelligences which can calculate utility functions directly. It’s not obvious to me that implementing suffering in the sense that humans understand it would be cheaper or more efficient for a superintelligence to do instead of simply creating utility-maximizers when it needs to produce a large number of sub-agents.
High overlap between approaches to mitigating x-risk and approaches to mitigating s-risks. If the best chance of mitigating future suffering is trying to bring about a friendly artificial intelligence explosion, then it seems that the approaches we are currently taking should still be the correct ones.
More speculatively: If we focus heavily on s-risks, does this open us up to issues regarding utility-monsters? Can I extort people by creating a simulation of trillions of agents and then threaten to minimize their utility? (If we simply value the sum of utility, and not necessarily the complexity of the agent having the utility, then this should be relatively cheap to implement).
I think the most general response to your first three points would look something like this: Any superintelligence that achieves human values will be adjacent in design space to many superintelligences that cause massive suffering, so it’s quite likely that the wrong superintelligence will win, due to human error, malice, or arms races.
As to your last point, it looks more like a research problem than a counterargument, and I’d be very interested in any progress on that front :-)
So being served a cup of coffee and being served a cup of pure capsaicin are “adjacent in design space”? Maybe, but funny how that problem doesn’t arise or even worry anyone...
That’s a twist on a standard LW argument, see e.g. here:
Fragility of value is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable
It seems to me that fragility of value can lead to massive suffering in many ways.
You’re basically dialing that argument up to eleven. From “losing a small part could lead to unacceptable results” you are jumping to “losing any small part will lead to unimaginable hellscapes”:
with a tall sharp peak (FAI) surrounded by a pit that’s astronomically deeper
Yeah, not all parts. But even if it’s a 1% chance, one hellscape might balance out a hundred universes where FAI wins. Pain is just too effective at creating disutility. I understand why people want to be optimistic, but I think being pessimistic in this case is more responsible.
So basically you are saying that the situation is asymmetric: the impact/magnitude of possible bad things is much much greater than the impact/magnitude of possible good things. Is this correct?
Yeah. One sign of asymmetry is that creating two universes, one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us. Another sign is that pain is an internal experience, while our values might refer to the external world (though it’s very murky), so the former might be much easier to achieve. Another sign is that in our world it’s much easier to create a life filled with pain than a life that fulfills human values.
Yes, many people intuitively feel that a universe of pleasure and a universe of pain add to a net negative. But I suspect that’s just a result of experiencing (and avoiding) lots of sources of extreme pain in our lives, while sources of pleasure tend to be diffuse and relatively rare. The human experience of pleasure is conjunctive because in order to survive and reproduce you must fairly reliably avoid all types of extreme pain. But in a pleasure-maximizing environment, removing pain will be a given.
It’s also true that our brains tend to adapt to pleasure over time, but that seems simple to modify once physiological constraints are removed.
“one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us”
Comparing pains and pleasures of similar magnitude? People have a tendency not to do this, see the linked thread.
“Another sign is that pain is an internal experience, while our values might refer to the external world (though it’s very murky”
You accept pain and risk of pain all the time to pursue various pleasures, desires and goals. Mice will cross electrified surfaces for tastier treats.
If you’re going to care about hedonic states as such, why treat the external case differently?
Alternatively, if you’re going to dismiss pleasure as just an indicator of true goals (e.g. that pursuit of pleasure as such is ‘wireheading’) then why not dismiss pain in the same way, as just a signal and not itself a goal?
Comparing pains and pleasures of similar magnitude?
My point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making? For example, I’d strongly disagree to create a box of pleasure and a box of pain, do you think my preference would go away after extrapolation?
“My point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making?”
I think with current tech it’s cheaper and easier to wirehead to increase pain (i.e. torture) than to increase pleasure or reduce pain. This makes sense biologically, since organisms won’t go looking for ways to wirehead to maximize their own pain, evolution doesn’t need to ‘hide the keys’ as much as with pleasure or pain relief (where the organism would actively seek out easy means of subverting the behavioral functions of the hedonic system). Thus when powerful addictive drugs are available, such as alcohol, human populations evolve increased resistance over time. The sex systems evolve to make masturbation less rewarding than reproductive sex under ancestral conditions, desire for play/curiosity is limited by boredom, delicious foods become less pleasant when full or the foods are not later associated with nutritional sensors in the stomach, etc.
I don’t think this is true with fine control over the nervous system (or a digital version) to adjust felt intensity and behavioral reinforcement. I think with that sort of full access one could easily increase the intensity (and ease of activation) of pleasures/mood such that one would trade them off against the most intense pains at ~parity per second, and attempts at subjective comparison when or after experiencing both would put them at ~parity.
People will willingly undergo very painful jobs and undertakings for money, physical pleasures, love, status, childbirth, altruism, meaning, etc. Unless you have a different standard for the ‘boxes’ than used in subjective comparison with rich experience of the things to be compared I think we just haggling over the price re intensity.
We know the felt caliber and behavioral influence of such things can vary greatly. It would be possible to alter nociception and pain receptors to amp up or damp down any particular pain. This could even involve adding a new sense, e.g. someone with congenital deafness could be given the ability to hear (installing new nerves and neurons), and hear painful sounds, with artificially set intensity of pain. Likewise one could add a new sense (or dial one up) to enable stronger pleasures. I think that both the new pains and new pleasures would ‘count’ to the same degree (and if you’re going to dismiss the pleasures as ‘wireheading’ then you should dismiss the pains too).
″ For example, I’d strongly disagree to create a box of pleasure and a box of pain, do you think my preference would go away after extrapolation?”
You trade off pain and pleasure in your own life, are you saying that the standard would be different for the boxes than for yourself?
What are you using as the examples to represent the boxes, and have you experienced them? (As discussed in my link above, people often use weaksauce examples in such comparison.)
We could certainly make agents for whom pleasure and pain would use equal resources per util. The question is if human preferences today (or extrapolated) would sympathize with such agents to the point of giving them the universe. Their decision-making could look very inhuman to us. If we value such agents with a discount factor, we’re back at square one.
That’s what the congenital deafness discussion was about.
You have preferences over pain and pleasure intensities that you haven’t experienced, or new durations of experiences you know. Otherwise you wouldn’t have anything to worry about re torture, since you haven’t experienced it.
Pain asymbolia is a condition in which pain is perceived, but with an absence of the suffering that is normally associated with the pain experience. Individuals with pain asymbolia still identify the stimulus as painful but do not display the behavioral or affective reactions that usually accompany pain; no sense of threat and/or danger is precipitated by pain.
Suppose you currently had pain asymbolia. Would that mean you wouldn’t object to pain and suffering in non-asymbolics? What if you personally had only happened to experience extremely mild discomfort while having lots of great positive experiences? What about for yourself? If you knew you were going to get a cure for your pain asymbolia tomorrow would you object to subsequent torture as intrinsically bad?
We can go through similar stories for major depression and positive mood.
Seems it’s the character of the experience that matters.
Likewise, if you’ve never experienced skiing, chocolate, favorite films, sex, victory in sports, and similar things that doesn’t mean you should act as though they have no moral value. This also holds true for enhanced experiences and experiences your brain currently is unable to have, like the case of congenital deafness followed by a procedure to grant hearing and listening to music.
Music and chocolate are known to be mostly safe. I guess I’m more cautious about new self-modifications that can change my decisions massively, including decisions about more self-modifications. It seems like if I’m not careful, you can devise a sequence that will turn me into a paperclipper. That’s why I discount such agents for now, until I understand better what CEV means.
conflict seems to be the most plausible scenario (and one which has a high prior placed on it as we can observe that much suffering today is caused by conflict), but it seems to be less and less likely of a scenario once you factor in superintelligence, as multi-polar scenarios seem to be either very short-lived or unlikely to happen at all.
This seems plausible but not obvious to me. Humans are superintelligent as compared to chimpanzees (let alone, say, Venus flytraps), but humans have still formed a multipolar civilization.
When thinking about whether s-risk scenarios are tied to or come about by similar means as x-risk scenarios (such as a malign intelligence explosion), the relevant issue to me seems to be whether or not such a scenario could result in a multi-polar conflict of cosmic proportions. I think the chance of that happening is quite low, since intelligence explosions seem to be most likely to result in a singleton.
Due to complexity and fragility of human values, any superintelligence that fulfills them will probably be adjacent in design space to many other superintelligences that cause lots of suffering (which is also much cheaper), so a wrong superintelligence might take over due to human error or malice or arms races. That’s where most s-risk is coming from, I think. The one in a million number seems optimistic, actually.
The only counterarguments I can think of would be:
The claim that the likelihood of s-risks being close to that of x-risks seems not well argued to me. In particular, conflict seems to be the most plausible scenario (and one which has a high prior placed on it as we can observe that much suffering today is caused by conflict), but it seems to be less and less likely of a scenario once you factor in superintelligence, as multi-polar scenarios seem to be either very short-lived or unlikely to happen at all.
We should be wary of applying anthropomorphic traits to hypothetical artificial agents in the future. Pain in biological organisms may very well have evolved as a proxy to negative utility, and might not be necessary in “pure” agent intelligences which can calculate utility functions directly. It’s not obvious to me that implementing suffering in the sense that humans understand it would be cheaper or more efficient for a superintelligence to do instead of simply creating utility-maximizers when it needs to produce a large number of sub-agents.
High overlap between approaches to mitigating x-risk and approaches to mitigating s-risks. If the best chance of mitigating future suffering is trying to bring about a friendly artificial intelligence explosion, then it seems that the approaches we are currently taking should still be the correct ones.
More speculatively: If we focus heavily on s-risks, does this open us up to issues regarding utility-monsters? Can I extort people by creating a simulation of trillions of agents and then threaten to minimize their utility? (If we simply value the sum of utility, and not necessarily the complexity of the agent having the utility, then this should be relatively cheap to implement).
I think the most general response to your first three points would look something like this: Any superintelligence that achieves human values will be adjacent in design space to many superintelligences that cause massive suffering, so it’s quite likely that the wrong superintelligence will win, due to human error, malice, or arms races.
As to your last point, it looks more like a research problem than a counterargument, and I’d be very interested in any progress on that front :-)
Why so? Flipping the sign doesn’t get you “adjacent”, it gets you “diametrically opposed”.
If you really want chocolate ice cream, “adjacent” would be getting strawberry ice cream, not having ghost pepper extract poured into your mouth.
They said “adjacent in design space”. The Levenshtein distance between
return val;
andreturn -val;
is 1.So being served a cup of coffee and being served a cup of pure capsaicin are “adjacent in design space”? Maybe, but funny how that problem doesn’t arise or even worry anyone...
More like driving to the store and driving into the brick wall of the store are adjacent in design space.
That’s a twist on a standard LW argument, see e.g. here:
It seems to me that fragility of value can lead to massive suffering in many ways.
You’re basically dialing that argument up to eleven. From “losing a small part could lead to unacceptable results” you are jumping to “losing any small part will lead to unimaginable hellscapes”:
Yeah, not all parts. But even if it’s a 1% chance, one hellscape might balance out a hundred universes where FAI wins. Pain is just too effective at creating disutility. I understand why people want to be optimistic, but I think being pessimistic in this case is more responsible.
So basically you are saying that the situation is asymmetric: the impact/magnitude of possible bad things is much much greater than the impact/magnitude of possible good things. Is this correct?
Yeah. One sign of asymmetry is that creating two universes, one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us. Another sign is that pain is an internal experience, while our values might refer to the external world (though it’s very murky), so the former might be much easier to achieve. Another sign is that in our world it’s much easier to create a life filled with pain than a life that fulfills human values.
Yes, many people intuitively feel that a universe of pleasure and a universe of pain add to a net negative. But I suspect that’s just a result of experiencing (and avoiding) lots of sources of extreme pain in our lives, while sources of pleasure tend to be diffuse and relatively rare. The human experience of pleasure is conjunctive because in order to survive and reproduce you must fairly reliably avoid all types of extreme pain. But in a pleasure-maximizing environment, removing pain will be a given.
It’s also true that our brains tend to adapt to pleasure over time, but that seems simple to modify once physiological constraints are removed.
“one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us”
Comparing pains and pleasures of similar magnitude? People have a tendency not to do this, see the linked thread.
“Another sign is that pain is an internal experience, while our values might refer to the external world (though it’s very murky”
You accept pain and risk of pain all the time to pursue various pleasures, desires and goals. Mice will cross electrified surfaces for tastier treats.
If you’re going to care about hedonic states as such, why treat the external case differently?
Alternatively, if you’re going to dismiss pleasure as just an indicator of true goals (e.g. that pursuit of pleasure as such is ‘wireheading’) then why not dismiss pain in the same way, as just a signal and not itself a goal?
My point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making? For example, I’d strongly disagree to create a box of pleasure and a box of pain, do you think my preference would go away after extrapolation?
“My point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making?”
I think with current tech it’s cheaper and easier to wirehead to increase pain (i.e. torture) than to increase pleasure or reduce pain. This makes sense biologically, since organisms won’t go looking for ways to wirehead to maximize their own pain, evolution doesn’t need to ‘hide the keys’ as much as with pleasure or pain relief (where the organism would actively seek out easy means of subverting the behavioral functions of the hedonic system). Thus when powerful addictive drugs are available, such as alcohol, human populations evolve increased resistance over time. The sex systems evolve to make masturbation less rewarding than reproductive sex under ancestral conditions, desire for play/curiosity is limited by boredom, delicious foods become less pleasant when full or the foods are not later associated with nutritional sensors in the stomach, etc.
I don’t think this is true with fine control over the nervous system (or a digital version) to adjust felt intensity and behavioral reinforcement. I think with that sort of full access one could easily increase the intensity (and ease of activation) of pleasures/mood such that one would trade them off against the most intense pains at ~parity per second, and attempts at subjective comparison when or after experiencing both would put them at ~parity.
People will willingly undergo very painful jobs and undertakings for money, physical pleasures, love, status, childbirth, altruism, meaning, etc. Unless you have a different standard for the ‘boxes’ than used in subjective comparison with rich experience of the things to be compared I think we just haggling over the price re intensity.
We know the felt caliber and behavioral influence of such things can vary greatly. It would be possible to alter nociception and pain receptors to amp up or damp down any particular pain. This could even involve adding a new sense, e.g. someone with congenital deafness could be given the ability to hear (installing new nerves and neurons), and hear painful sounds, with artificially set intensity of pain. Likewise one could add a new sense (or dial one up) to enable stronger pleasures. I think that both the new pains and new pleasures would ‘count’ to the same degree (and if you’re going to dismiss the pleasures as ‘wireheading’ then you should dismiss the pains too).
″ For example, I’d strongly disagree to create a box of pleasure and a box of pain, do you think my preference would go away after extrapolation?”
You trade off pain and pleasure in your own life, are you saying that the standard would be different for the boxes than for yourself?
What are you using as the examples to represent the boxes, and have you experienced them? (As discussed in my link above, people often use weaksauce examples in such comparison.)
We could certainly make agents for whom pleasure and pain would use equal resources per util. The question is if human preferences today (or extrapolated) would sympathize with such agents to the point of giving them the universe. Their decision-making could look very inhuman to us. If we value such agents with a discount factor, we’re back at square one.
That’s what the congenital deafness discussion was about.
You have preferences over pain and pleasure intensities that you haven’t experienced, or new durations of experiences you know. Otherwise you wouldn’t have anything to worry about re torture, since you haven’t experienced it.
Consider people with pain asymbolia:
Suppose you currently had pain asymbolia. Would that mean you wouldn’t object to pain and suffering in non-asymbolics? What if you personally had only happened to experience extremely mild discomfort while having lots of great positive experiences? What about for yourself? If you knew you were going to get a cure for your pain asymbolia tomorrow would you object to subsequent torture as intrinsically bad?
We can go through similar stories for major depression and positive mood.
Seems it’s the character of the experience that matters.
Likewise, if you’ve never experienced skiing, chocolate, favorite films, sex, victory in sports, and similar things that doesn’t mean you should act as though they have no moral value. This also holds true for enhanced experiences and experiences your brain currently is unable to have, like the case of congenital deafness followed by a procedure to grant hearing and listening to music.
Music and chocolate are known to be mostly safe. I guess I’m more cautious about new self-modifications that can change my decisions massively, including decisions about more self-modifications. It seems like if I’m not careful, you can devise a sequence that will turn me into a paperclipper. That’s why I discount such agents for now, until I understand better what CEV means.
This seems plausible but not obvious to me. Humans are superintelligent as compared to chimpanzees (let alone, say, Venus flytraps), but humans have still formed a multipolar civilization.
When thinking about whether s-risk scenarios are tied to or come about by similar means as x-risk scenarios (such as a malign intelligence explosion), the relevant issue to me seems to be whether or not such a scenario could result in a multi-polar conflict of cosmic proportions. I think the chance of that happening is quite low, since intelligence explosions seem to be most likely to result in a singleton.
Due to complexity and fragility of human values, any superintelligence that fulfills them will probably be adjacent in design space to many other superintelligences that cause lots of suffering (which is also much cheaper), so a wrong superintelligence might take over due to human error or malice or arms races. That’s where most s-risk is coming from, I think. The one in a million number seems optimistic, actually.