I feel a weird disconnect on reading comments like this. I thought s-risks were a part of conventional wisdom on here all along. (We even had an infamous scandal that concerned one class of such risks!) Scott didn’t “see it before the rest of us”—he was drawing on an existing, and by now classical, memeplex.
It’s like when some people spoke as if nobody had ever thought of AI risk until Bostrom wrote Superintelligence—even though that book just summarized what people (not least of whom Bostrom himself) had already been saying for years.
I guess I didn’t think about it carefully before. I assumed that s-risks were much less likely than x-risks (true) so it’s okay not to worry about them (false). The mistake was that logical leap.
In terms of utility, the landscape of possible human-built superintelligences might look like a big flat plain (paperclippers and other things that kill everyone without fuss), with a tall sharp peak (FAI) surrounded by a pit that’s astronomically deeper (many almost-FAIs and other designs that sound natural to humans). The pit needs to be compared to the peak, not the plain. If the pit is more likely, I’d rather have the plain.
I didn’t realize then that disutility of human-built AI can be much larger than utility of FAI, because pain is easier to achieve than human utility (which doesn’t reduce to pleasure). That makes the argument much stronger.
I didn’t realize then that disutility of human-built AI can be much larger than utility of FAI, because pain is easier to achieve than human utility (which doesn’t reduce to pleasure).
This argument doesn’t actually seem to be in the article that Kaj linked to. Did you see it somewhere else, or come up with it yourself? I’m not sure it makes sense, but I’d like to read more if it’s written up somewhere. (My objection is that “easier to achieve” doesn’t necessarily mean the maximum value achievable is higher. It could be that it would take longer or more effort to achieve the maximum value, but the actual maximums aren’t that different. For example, maybe the extra stuff needed for human utility (aside from pleasure) is complex but doesn’t actually cost much in terms of mass/energy.)
The argument somehow came to my mind yesterday, and I’m not sure it’s true either. But do you really think human value might be as easy to maximize as pleasure or pain? Pain is only about internal states, and human value seems to be partly about external states, so it should be way more expensive.
One of the more crucial points, I think, is that positive utility is – for most humans – complex and its creation is conjunctive. Disutility, in contrast, is disjunctive. Consequently, the probability of creating the former is smaller than the latter – all else being equal (of course, all else is not equal).
In other words, the scenarios leading towards the creation of (large amounts of) positive human value are conjunctive: to create a highly positive future, we have to eliminate (or at least substantially reduce) physical pain and boredom and injustice and loneliness and inequality (at least certain forms of it) and death, etc. etc. etc. (You might argue that getting “FAI” and “CEV” right would accomplish all those things at once (true) but getting FAI and CEV right is, of course, a highly conjunctive task in itself.)
In contrast, disutility is much more easily created and essentially disjunctive. Many roads lead towards dystopia: sadistic programmers or failing AI safety wholesale (or “only” value-loading or extrapolating, or stable self-modification), or some totalitarian regime takes over, etc. etc.
It’s also not a coincidence that even the most untalented writer with the most limited imagination can conjure up a convincing dystopian society. Envisioning a true utopia in concrete detail, on the other hand, is nigh impossible for most human minds.
“[...] human intuitions about what is valuable are often complex and fragile (Yudkowsky, 2011), taking up only a small area in the space of all possible values. In other words, the number of possible configurations of matter constituting anything we would value highly (under reflection) is arguably smaller than the number of possible configurations that constitute some sort of strong suffering or disvalue, making the incidental creation of the latter ceteris paribus more likely.”
Consequently, UFAIs such as paperclippers are more likely to create large amounts of disutility than utility (factoring out acausal considerations) incidentally (e.g. because creating simulations is instrumentally useful for them).
Generally, I like how you put it in your comment here:
In terms of utility, the landscape of possible human-built superintelligences might look like a big flat plain (paperclippers and other things that kill everyone without fuss), with a tall sharp peak (FAI) surrounded by a pit that’s astronomically deeper (many almost-FAIs and other designs that sound natural to humans). The pit needs to be compared to the peak, not the plain. If the pit is more likely, I’d rather have the plain.
Yeah. In a nutshell, supporting generic x-risk-reduction (which also reduces extinction risks) is in one’s best interest, if and only if one’s own normative trade-ratio of suffering vs. happiness is less suffering-focused than one’s estimate of the ratio of expected future happiness to suffering (feel free to replace “happiness” with utility and “suffering” with disutility). If one is more pessimistic about the future or if one needs large amounts of happiness to trade-off small amounts of suffering, one should rather focus on s-risk-reduction instead. Of course, this simplistic analysis leaves out issues like cooperation with others, neglectedness, tractability, moral uncertainty, acausal considerations, etc.
Yeah, I also had the idea about utility being conjunctive and mentioned it in a deleted reply to Wei, but then realized that Eliezer’s version (fragility of value) already exists and is better argued.
On the other hand, maybe the worst hellscapes can be prevented in one go, if we “just” solve the problem of consciousness and tell the AI what suffering means. We don’t need all of human value for that. Hellscapes without suffering can also be pretty bad in terms of human value, but not quite as bad, I think. Of course solving consciousness is still a very tall order, but it might be easier than solving all philosophy that’s required for FAI, and it can lead to other shortcuts like in my recent post (not that I’d propose them seriously).
Some people at MIRI might be thinking about this under nonperson predicate. (Eliezer’s view on which computations matter morally is different from the one endorsed by Brian, though.) And maybe it’s important to not limit FAI options too much by preventing mindcrime at all costs – if there are benefits against other very bad failure modes (or – cooperatively – just increased controllability for the people who care a lot about utopia-type outcomes), maybe some mindcrime in the early stages to ensure goal-alignment would be the lesser evil.
Human disutility includes more than just pain too. Destruction of the humanity (the flat plain you describe) carries a great deal of negative utility for me, even if I disappear without feeling any pain at all. There’s more disutility if all life is destroyed, and more if the universe as a whole is destroyed… I don’t think there’s any fundamental asymmetry. Pain and pleasure are the most immediate ways of affecting value, and probably the ones that can be achieved most efficiently in computronium, so external states probably don’t come into play much at all if you take a purely utilitarian view.
Our values might say, for example, that a universe filled with suffering insects is very undesirable, but a universe filled with happy insects isn’t very desirable. More generally, if our values are a conjunction of many different values, then it’s probably easier to create a universe where one is strongly negative and the rest are zero, than a universe where all are strongly positive. I haven’t seen the argument written up, I’m trying to figure it out now.
Huh, I feel very differently. For AI risk specifically, I thought the conventional wisdom was always “if AI goes wrong, the most likely outcome is that we’ll all just die, and the next most likely outcome is that we get a future which somehow goes against our values even if it makes us very happy.” And besides AI risk, other x-risks haven’t really been discussed at all on LW. I don’t recall seeing any argument for s-risks being a particularly plausible category of risks, let alone one of the most important ones.
It’s true that there was That One Scandal, but the reaction to that was quite literally Let’s Never Talk About This Again—or alternatively Let’s Keep Bringing This Up To Complain About How It Was Handled, depending on the person in question—but then people always only seemed to be talking about that specific incident and argument. I never saw anyone draw the conclusion that “hey, this looks like an important subcategory of x-risks that warrants separate investigation and dedicated work to avoid”.
I don’t recall seeing any argument for s-risks being a particularly plausible category of risks, let alone one of the most important ones.
There was some discussion back in 2012 and sporadically sincethen. (ETA: You can also do a search for “hell simulations” and get a bunch more results.)
I never saw anyone draw the conclusion that “hey, this looks like an important subcategory of x-risks that warrants separate investigation and dedicated work to avoid”.
I’ve always thought that in order to prevent astronomical suffering, we will probably want to eventually (i.e., after a lot of careful thought) build an FAI that will colonize the universe and stop any potential astronomical suffering arising from alien origins and/or try to reduce suffering in other universes via acausal trade etc., so the work isn’t very different from other x-risk work. But now that the x-risk community is larger, maybe it does make sense to split out some of the more s-risk specific work?
I’ve always thought that in order to prevent astronomical suffering, we will probably want to eventually (i.e., after a lot of careful thought) build an FAI that will colonize the universe and stop any potential astronomical suffering arising from alien origins and/or try to reduce suffering in other universes via acausal trade etc., so the work isn’t very different from other x-risk work.
It seems like the most likely reasons to create suffering come from the existence of suffering-hating civilizations. Do you think that it’s clear/very likely that it is net helpful for there to be more mature suffering-hating civilizations? (On the suffering-focused perspective.)
Do you think that it’s clear/very likely that it is net helpful for there to be more mature suffering-hating civilizations? (On the suffering-focused perspective.)
My intuition is that there is no point in trying to answer questions like these before we know a lot more about decision theory, metaethics, metaphilosophy, and normative ethics, so pushing for a future where these kinds of questions eventually get answered correctly (and the answers make a difference in what happens) seems like the most important thing to do. It doesn’t seem to make sense to try to lock in some answers (i.e., make our civilization suffering-hating or not suffering-hating) on the off chance that when we figure out what the answers actually are, it will be too late. Someone with much less moral/philosophical uncertainty than I do would perhaps prioritize things differently, but I find it difficult to motivate myself to think really hard from their perspective.
If we try to answer the question now, it seems very likely we’ll get the answer wrong (given my state of uncertainty about the inputs that go into the question). I want to keep civilization going until we know better how to answer these types of questions. For example if we succeed in building a correctly designed/implemented Singleton FAI, it ought to be able to consider this question at leisure, and if it becomes clear that the existence of mature suffering-hating civilizations actually causes more suffering to be created, then it can decide to not make us into a mature suffering-hating civilization, or take whatever other action is appropriate.
Are you worried that by the time such an FAI (or whatever will control our civilization) figures out the answer, it will be too late? (Why? If we can decide that x-risk reduction is bad, then so can it. If it’s too late to alter or end civilization at that point, why isn’t it already too late for us?) Or are you worried more that the question won’t be answered correctly by whatever will control our civilization?
If you are concerned exclusively with suffering, then increasing the number of mature civilizations is obviously bad and you’d prefer that the average civilization not exist. You might think that our descendants are particularly good to keep around, since we hate suffering so much. But in fact almost all s-risks occur precisely because of civilizations that hate suffering, so it’s not at all clear that creating “the civilization that we will become on reflection” is better than creating “a random civilization” (which is bad).
To be clear, even if we have modest amounts of moral uncertainty I think it could easily justify a “wait and see” style approach. But if we were committed to a suffering-focused view then I don’t think your argument works.
But in fact almost all s-risks occur precisely because of civilizations that hate suffering
It seems just as plausible to me that suffering-hating civilizations reduce the overall amount of suffering in the multiverse, so I think I’d wait until it becomes clear which is the case, even if I was concerned exclusively with suffering. But I haven’t thought about this question much, since I haven’t had a reason to assume an exclusive concern with suffering, until you started asking me to.
To be clear, even if we have modest amounts of moral uncertainty I think it could easily justify a “wait and see” style approach. But if we were committed to a suffering-focused view then I don’t think your argument works.
Earlier in this thread I’d been speaking from the perspective of my own moral uncertainty, not from a purely suffering-focused view, since we were discussing the linked article, and Kaj had written:
The article isn’t specifically negative utilitarian, though—even classical utilitarians would agree that having astronomical amounts of suffering is a bad thing. Nor do you have to be a utilitarian in the first place to think it would be bad: as the article itself notes, pretty much all major value systems probably agree on s-risks being a major Bad Thing
What’s your reason for considering a purely suffering-focused view? Intellectual curiosity? Being nice to or cooperating with people like Brian Tomasik by helping to analyze one of their problems?
Or are you worried more that the question won’t be answered correctly by whatever will control our civilization?
Perhaps this, in case it turns out to be highly important but difficult to get certain ingredients – e.g. priors or decision theory – exactly right. (But I have no idea, it’s also plausible that suboptimal designs could patch themselves well, get rescued somehow, or just have their goals changed without much fuss.)
That sort of subject is inherently implicit in the kind of decision-theoretic questions that MIRI-style AI research involves. More generally, when one is thinking about astronomical-scale questions, and aggregating utilities, and so on, it is a matter of course that cosmically bad outcomes are as much of a theoretical possibility as cosmically good outcomes.
Now, the idea that one might need to specifically think about the bad outcomes, in the sense that preventing them might require strategies separate from those required for achieving good outcomes, may depend on additional assumptions that haven’t been conventional wisdom here.
Now, the idea that one might need to specifically think about the bad outcomes, in the sense that preventing them might require strategies separate from those required for achieving good outcomes, may depend on additional assumptions that haven’t been conventional wisdom here.
Right, I took this idea to be one of the main contributions of the article, and assumed that this was one of the reasons why cousin_it felt it was important and novel.
Thanks for voicing this sentiment I had upon reading the original comment. My impression was that negative utilitarian viewpoints / things of this sort had been trending for far longer than cousin_it’s comment might suggest.
The article isn’t specifically negative utilitarian, though—even classical utilitarians would agree that having astronomical amounts of suffering is a bad thing. Nor do you have to be a utilitarian in the first place to think it would be bad: as the article itself notes, pretty much all major value systems probably agree on s-risks being a major Bad Thing:
All plausible value systems agree that suffering, all else being equal, is undesirable. That is, everyone agrees that we have reasons to avoid suffering. S-risks are risks of massive suffering, so I hope you agree that it’s good to prevent s-risks.
Decision theory (which includes the study of risks of that sort)
No, it doesn’t. Decision theory deals with abstract utility functions. It can talk about outcomes A, B, and C where A is preferred to B and B is preferred to C, but doesn’t care whether A represents the status quo, B represents death, and C represents extreme suffering, or whether A represents gaining lots of wealth and status, B represents the status quo, and C represents death, so long as the ratios of utility differences are the same in each case. Decision theory has nothing to do with the study of s-risks.
What Alex said doesn’t seem to refute or change what I said.
But also: I disagree with the parent. I take conventional wisdom here to include support for MIRI’s agent foundations agenda, which includes decision theory, which includes the study of such risks (even if only indirectly or implicitly).
Fair enough. I guess I didn’t think carefully about it before. I assumed that s-risks were much less likely than x-risks (true) and so they could be discounted (false). It seems like the right way to imagine the landscape of superintelligences is a vast flat plain (paperclippers and other things that kill everyone without fuss) with a tall thin peak (FAIs) surrounded by a pit that’s astronomically deeper (FAI-adjacent and other designs). The right comparison is between the peak and the pit, because if the pit is more likely, I’d rather have the plain.
I feel a weird disconnect on reading comments like this. I thought s-risks were a part of conventional wisdom on here all along. (We even had an infamous scandal that concerned one class of such risks!) Scott didn’t “see it before the rest of us”—he was drawing on an existing, and by now classical, memeplex.
It’s like when some people spoke as if nobody had ever thought of AI risk until Bostrom wrote Superintelligence—even though that book just summarized what people (not least of whom Bostrom himself) had already been saying for years.
I guess I didn’t think about it carefully before. I assumed that s-risks were much less likely than x-risks (true) so it’s okay not to worry about them (false). The mistake was that logical leap.
In terms of utility, the landscape of possible human-built superintelligences might look like a big flat plain (paperclippers and other things that kill everyone without fuss), with a tall sharp peak (FAI) surrounded by a pit that’s astronomically deeper (many almost-FAIs and other designs that sound natural to humans). The pit needs to be compared to the peak, not the plain. If the pit is more likely, I’d rather have the plain.
Was it obvious to you all along?
Didn’t you realize this yourself back in 2012?
I didn’t realize then that disutility of human-built AI can be much larger than utility of FAI, because pain is easier to achieve than human utility (which doesn’t reduce to pleasure). That makes the argument much stronger.
This argument doesn’t actually seem to be in the article that Kaj linked to. Did you see it somewhere else, or come up with it yourself? I’m not sure it makes sense, but I’d like to read more if it’s written up somewhere. (My objection is that “easier to achieve” doesn’t necessarily mean the maximum value achievable is higher. It could be that it would take longer or more effort to achieve the maximum value, but the actual maximums aren’t that different. For example, maybe the extra stuff needed for human utility (aside from pleasure) is complex but doesn’t actually cost much in terms of mass/energy.)
The argument somehow came to my mind yesterday, and I’m not sure it’s true either. But do you really think human value might be as easy to maximize as pleasure or pain? Pain is only about internal states, and human value seems to be partly about external states, so it should be way more expensive.
One of the more crucial points, I think, is that positive utility is – for most humans – complex and its creation is conjunctive. Disutility, in contrast, is disjunctive. Consequently, the probability of creating the former is smaller than the latter – all else being equal (of course, all else is not equal).
In other words, the scenarios leading towards the creation of (large amounts of) positive human value are conjunctive: to create a highly positive future, we have to eliminate (or at least substantially reduce) physical pain and boredom and injustice and loneliness and inequality (at least certain forms of it) and death, etc. etc. etc. (You might argue that getting “FAI” and “CEV” right would accomplish all those things at once (true) but getting FAI and CEV right is, of course, a highly conjunctive task in itself.)
In contrast, disutility is much more easily created and essentially disjunctive. Many roads lead towards dystopia: sadistic programmers or failing AI safety wholesale (or “only” value-loading or extrapolating, or stable self-modification), or some totalitarian regime takes over, etc. etc.
It’s also not a coincidence that even the most untalented writer with the most limited imagination can conjure up a convincing dystopian society. Envisioning a true utopia in concrete detail, on the other hand, is nigh impossible for most human minds.
Footnote 10 of the above mentioned s-risk-static makes a related point (emphasis mine):
“[...] human intuitions about what is valuable are often complex and fragile (Yudkowsky, 2011), taking up only a small area in the space of all possible values. In other words, the number of possible configurations of matter constituting anything we would value highly (under reflection) is arguably smaller than the number of possible configurations that constitute some sort of strong suffering or disvalue, making the incidental creation of the latter ceteris paribus more likely.”
Consequently, UFAIs such as paperclippers are more likely to create large amounts of disutility than utility (factoring out acausal considerations) incidentally (e.g. because creating simulations is instrumentally useful for them).
Generally, I like how you put it in your comment here:
Yeah. In a nutshell, supporting generic x-risk-reduction (which also reduces extinction risks) is in one’s best interest, if and only if one’s own normative trade-ratio of suffering vs. happiness is less suffering-focused than one’s estimate of the ratio of expected future happiness to suffering (feel free to replace “happiness” with utility and “suffering” with disutility). If one is more pessimistic about the future or if one needs large amounts of happiness to trade-off small amounts of suffering, one should rather focus on s-risk-reduction instead. Of course, this simplistic analysis leaves out issues like cooperation with others, neglectedness, tractability, moral uncertainty, acausal considerations, etc.
Do you think that makes sense?
Yeah, I also had the idea about utility being conjunctive and mentioned it in a deleted reply to Wei, but then realized that Eliezer’s version (fragility of value) already exists and is better argued.
On the other hand, maybe the worst hellscapes can be prevented in one go, if we “just” solve the problem of consciousness and tell the AI what suffering means. We don’t need all of human value for that. Hellscapes without suffering can also be pretty bad in terms of human value, but not quite as bad, I think. Of course solving consciousness is still a very tall order, but it might be easier than solving all philosophy that’s required for FAI, and it can lead to other shortcuts like in my recent post (not that I’d propose them seriously).
Some people at MIRI might be thinking about this under nonperson predicate. (Eliezer’s view on which computations matter morally is different from the one endorsed by Brian, though.) And maybe it’s important to not limit FAI options too much by preventing mindcrime at all costs – if there are benefits against other very bad failure modes (or – cooperatively – just increased controllability for the people who care a lot about utopia-type outcomes), maybe some mindcrime in the early stages to ensure goal-alignment would be the lesser evil.
Human disutility includes more than just pain too. Destruction of the humanity (the flat plain you describe) carries a great deal of negative utility for me, even if I disappear without feeling any pain at all. There’s more disutility if all life is destroyed, and more if the universe as a whole is destroyed… I don’t think there’s any fundamental asymmetry. Pain and pleasure are the most immediate ways of affecting value, and probably the ones that can be achieved most efficiently in computronium, so external states probably don’t come into play much at all if you take a purely utilitarian view.
Our values might say, for example, that a universe filled with suffering insects is very undesirable, but a universe filled with happy insects isn’t very desirable. More generally, if our values are a conjunction of many different values, then it’s probably easier to create a universe where one is strongly negative and the rest are zero, than a universe where all are strongly positive. I haven’t seen the argument written up, I’m trying to figure it out now.
Huh, I feel very differently. For AI risk specifically, I thought the conventional wisdom was always “if AI goes wrong, the most likely outcome is that we’ll all just die, and the next most likely outcome is that we get a future which somehow goes against our values even if it makes us very happy.” And besides AI risk, other x-risks haven’t really been discussed at all on LW. I don’t recall seeing any argument for s-risks being a particularly plausible category of risks, let alone one of the most important ones.
It’s true that there was That One Scandal, but the reaction to that was quite literally Let’s Never Talk About This Again—or alternatively Let’s Keep Bringing This Up To Complain About How It Was Handled, depending on the person in question—but then people always only seemed to be talking about that specific incident and argument. I never saw anyone draw the conclusion that “hey, this looks like an important subcategory of x-risks that warrants separate investigation and dedicated work to avoid”.
There was some discussion back in 2012 and sporadically since then. (ETA: You can also do a search for “hell simulations” and get a bunch more results.)
I’ve always thought that in order to prevent astronomical suffering, we will probably want to eventually (i.e., after a lot of careful thought) build an FAI that will colonize the universe and stop any potential astronomical suffering arising from alien origins and/or try to reduce suffering in other universes via acausal trade etc., so the work isn’t very different from other x-risk work. But now that the x-risk community is larger, maybe it does make sense to split out some of the more s-risk specific work?
It seems like the most likely reasons to create suffering come from the existence of suffering-hating civilizations. Do you think that it’s clear/very likely that it is net helpful for there to be more mature suffering-hating civilizations? (On the suffering-focused perspective.)
My intuition is that there is no point in trying to answer questions like these before we know a lot more about decision theory, metaethics, metaphilosophy, and normative ethics, so pushing for a future where these kinds of questions eventually get answered correctly (and the answers make a difference in what happens) seems like the most important thing to do. It doesn’t seem to make sense to try to lock in some answers (i.e., make our civilization suffering-hating or not suffering-hating) on the off chance that when we figure out what the answers actually are, it will be too late. Someone with much less moral/philosophical uncertainty than I do would perhaps prioritize things differently, but I find it difficult to motivate myself to think really hard from their perspective.
This question seems like a major input into whether x-risk reduction is useful.
If we try to answer the question now, it seems very likely we’ll get the answer wrong (given my state of uncertainty about the inputs that go into the question). I want to keep civilization going until we know better how to answer these types of questions. For example if we succeed in building a correctly designed/implemented Singleton FAI, it ought to be able to consider this question at leisure, and if it becomes clear that the existence of mature suffering-hating civilizations actually causes more suffering to be created, then it can decide to not make us into a mature suffering-hating civilization, or take whatever other action is appropriate.
Are you worried that by the time such an FAI (or whatever will control our civilization) figures out the answer, it will be too late? (Why? If we can decide that x-risk reduction is bad, then so can it. If it’s too late to alter or end civilization at that point, why isn’t it already too late for us?) Or are you worried more that the question won’t be answered correctly by whatever will control our civilization?
If you are concerned exclusively with suffering, then increasing the number of mature civilizations is obviously bad and you’d prefer that the average civilization not exist. You might think that our descendants are particularly good to keep around, since we hate suffering so much. But in fact almost all s-risks occur precisely because of civilizations that hate suffering, so it’s not at all clear that creating “the civilization that we will become on reflection” is better than creating “a random civilization” (which is bad).
To be clear, even if we have modest amounts of moral uncertainty I think it could easily justify a “wait and see” style approach. But if we were committed to a suffering-focused view then I don’t think your argument works.
It seems just as plausible to me that suffering-hating civilizations reduce the overall amount of suffering in the multiverse, so I think I’d wait until it becomes clear which is the case, even if I was concerned exclusively with suffering. But I haven’t thought about this question much, since I haven’t had a reason to assume an exclusive concern with suffering, until you started asking me to.
Earlier in this thread I’d been speaking from the perspective of my own moral uncertainty, not from a purely suffering-focused view, since we were discussing the linked article, and Kaj had written:
What’s your reason for considering a purely suffering-focused view? Intellectual curiosity? Being nice to or cooperating with people like Brian Tomasik by helping to analyze one of their problems?
Understanding the recommendations of each plausible theory seems like a useful first step in decision-making under moral uncertainty.
Perhaps this, in case it turns out to be highly important but difficult to get certain ingredients – e.g. priors or decision theory – exactly right. (But I have no idea, it’s also plausible that suboptimal designs could patch themselves well, get rescued somehow, or just have their goals changed without much fuss.)
That sort of subject is inherently implicit in the kind of decision-theoretic questions that MIRI-style AI research involves. More generally, when one is thinking about astronomical-scale questions, and aggregating utilities, and so on, it is a matter of course that cosmically bad outcomes are as much of a theoretical possibility as cosmically good outcomes.
Now, the idea that one might need to specifically think about the bad outcomes, in the sense that preventing them might require strategies separate from those required for achieving good outcomes, may depend on additional assumptions that haven’t been conventional wisdom here.
Right, I took this idea to be one of the main contributions of the article, and assumed that this was one of the reasons why cousin_it felt it was important and novel.
Thanks for voicing this sentiment I had upon reading the original comment. My impression was that negative utilitarian viewpoints / things of this sort had been trending for far longer than cousin_it’s comment might suggest.
The article isn’t specifically negative utilitarian, though—even classical utilitarians would agree that having astronomical amounts of suffering is a bad thing. Nor do you have to be a utilitarian in the first place to think it would be bad: as the article itself notes, pretty much all major value systems probably agree on s-risks being a major Bad Thing:
Yes, but the claim that that risk needs to be taken seriously is certainly not conventional wisdom around here.
Decision theory (which includes the study of risks of that sort) has long been a core component of AI-alignment research.
No, it doesn’t. Decision theory deals with abstract utility functions. It can talk about outcomes A, B, and C where A is preferred to B and B is preferred to C, but doesn’t care whether A represents the status quo, B represents death, and C represents extreme suffering, or whether A represents gaining lots of wealth and status, B represents the status quo, and C represents death, so long as the ratios of utility differences are the same in each case. Decision theory has nothing to do with the study of s-risks.
The first and last sentences of the parent comment do not follow from the statements in between.
That doesn’t seem to refute or change what Alex said?
What Alex said doesn’t seem to refute or change what I said.
But also: I disagree with the parent. I take conventional wisdom here to include support for MIRI’s agent foundations agenda, which includes decision theory, which includes the study of such risks (even if only indirectly or implicitly).
Fair enough. I guess I didn’t think carefully about it before. I assumed that s-risks were much less likely than x-risks (true) and so they could be discounted (false). It seems like the right way to imagine the landscape of superintelligences is a vast flat plain (paperclippers and other things that kill everyone without fuss) with a tall thin peak (FAIs) surrounded by a pit that’s astronomically deeper (FAI-adjacent and other designs). The right comparison is between the peak and the pit, because if the pit is more likely, I’d rather have the plain.