So I don’t have much experience with philosophy; this is mainly a collection of my thoughts as I read through.
1) S-risks seem to basically describe hellscapes, situations of unimaginable suffering. Is that about right?
2) Two assumptions here seem to be valuing future sentience and the additive nature of utility/suffering. Are these typical stances to be taking? Should there be some sort of discounting happening here?
3) I’m pretty sure I’m strawmanning here, but I can’t but feel like there’s some sort of argument by definition here where we first defined s-risks as the worst things possible, then concluded that we should work on them because EAs might want to avert the worst things possible. It seems...a little vacuous?
4) In UNSONG, someone mentioned that Thamiel is basically an anti-Friendly AI in that he’s roughly the inverse of our human values. That is, actual Unfriendliness (i.e. designed to maximize suffering) seem to be subtly encoding a lot of dense information about human suffering much the same way that Friendliness does. So I guess I’m trying to say that causing s-risks to happen actually seems to be a pretty hard problem, at least one that requires far more nuanced models than merely extinction.
In the case of AI going wrong, I currently find it far more plausible that extinction happens, rather than a hellscape scenario. It seems to me that we’d need to get like 90% of the way to alignment and then take a sharp turn for s-risks to happen, and given that we haven’t really made much substantial progress in alignment, I guess I’m unconvinced?
5) Oh, wait, looks like you covered point 4 about 3/4ths of the way down the page.
6) Additional arguments for s-risks seem to be based upon suffering of other potential sapient beings we create. I haven’t read Tomasik’s stuff, so I can’t say that much here, except that it seems to me that sapience might not equal capacity for suffering?
7) Your conclusion seems a little strong. I agree that conflicts can cause localized suffering (e.g. torturing people during wartime), but the arguments seem to rest quite a bit on proposed future sentient beings, which, I dunno, don’t seem as imminent? (For context, I’m worried about x-risks because projections in the next 100 years spread across things like climate change and more unpredictable things like AI paint a fairly bleak picture.)
8) I’m just noting that, should any x-risk come to pass, this solves the s-risk problem for humans / things related to humans. But there could just as well be sapient aliens suffering elsewhere, I guess.
causing s-risks to happen actually seems to be a pretty hard problem, at least one that requires far more nuanced models than merely extinction.
To maximize human suffering per unit of space-time, you need a good model of human values, just like a Friendly AI.
But to create astronomical amount of human suffering (without really maximizing it), you only need to fill astronomical amount of space-time with humans living in bad conditions, and prevent them from escaping those conditions. Relatively easier.
Instead of Thamiel, imagine immortal Pol Pot with space travel.
So I don’t have much experience with philosophy; this is mainly a collection of my thoughts as I read through.
1) S-risks seem to basically describe hellscapes, situations of unimaginable suffering. Is that about right?
2) Two assumptions here seem to be valuing future sentience and the additive nature of utility/suffering. Are these typical stances to be taking? Should there be some sort of discounting happening here?
3) I’m pretty sure I’m strawmanning here, but I can’t but feel like there’s some sort of argument by definition here where we first defined s-risks as the worst things possible, then concluded that we should work on them because EAs might want to avert the worst things possible. It seems...a little vacuous?
4) In UNSONG, someone mentioned that Thamiel is basically an anti-Friendly AI in that he’s roughly the inverse of our human values. That is, actual Unfriendliness (i.e. designed to maximize suffering) seem to be subtly encoding a lot of dense information about human suffering much the same way that Friendliness does. So I guess I’m trying to say that causing s-risks to happen actually seems to be a pretty hard problem, at least one that requires far more nuanced models than merely extinction.
In the case of AI going wrong, I currently find it far more plausible that extinction happens, rather than a hellscape scenario. It seems to me that we’d need to get like 90% of the way to alignment and then take a sharp turn for s-risks to happen, and given that we haven’t really made much substantial progress in alignment, I guess I’m unconvinced?
5) Oh, wait, looks like you covered point 4 about 3/4ths of the way down the page.
6) Additional arguments for s-risks seem to be based upon suffering of other potential sapient beings we create. I haven’t read Tomasik’s stuff, so I can’t say that much here, except that it seems to me that sapience might not equal capacity for suffering?
7) Your conclusion seems a little strong. I agree that conflicts can cause localized suffering (e.g. torturing people during wartime), but the arguments seem to rest quite a bit on proposed future sentient beings, which, I dunno, don’t seem as imminent? (For context, I’m worried about x-risks because projections in the next 100 years spread across things like climate change and more unpredictable things like AI paint a fairly bleak picture.)
8) I’m just noting that, should any x-risk come to pass, this solves the s-risk problem for humans / things related to humans. But there could just as well be sapient aliens suffering elsewhere, I guess.
To maximize human suffering per unit of space-time, you need a good model of human values, just like a Friendly AI.
But to create astronomical amount of human suffering (without really maximizing it), you only need to fill astronomical amount of space-time with humans living in bad conditions, and prevent them from escaping those conditions. Relatively easier.
Instead of Thamiel, imagine immortal Pol Pot with space travel.
Ah, okay. Thanks for the clarification here.