“S-risk” means “risk of astronomical amounts of suffering”. Typically people are imagining crazy things like Dyson-sphere-ing every star in the local supercluster in order to create 1e40 (or whatever) person-years of unimaginable torture.
If the outcome is “merely” trillions of person-years of intense torture, then that maybe still qualifies as an s-risk. Billions, probably not. We can just call it “very very bad”. Not all very very bad things are s-risks.
Does that help clarify why I think Reward Button Alignment poses very low s-risk?
Yeah I agree that it wouldn’t be a very bad kind of s-risk. The way I thought about s-risk was more like expected amount of suffering. But yeah I agree with you it’s not that bad and perhaps most expected suffering comes from more active utility-invert threats or values.
(Though tbc, I was totally imagining 1e40 humans being forced to press reward buttons.)
“S-risk” means “risk of astronomical amounts of suffering”. Typically people are imagining crazy things like Dyson-sphere-ing every star in the local supercluster in order to create 1e40 (or whatever) person-years of unimaginable torture.
If the outcome is “merely” trillions of person-years of intense torture, then that maybe still qualifies as an s-risk. Billions, probably not. We can just call it “very very bad”. Not all very very bad things are s-risks.
Does that help clarify why I think Reward Button Alignment poses very low s-risk?
Yeah I agree that it wouldn’t be a very bad kind of s-risk. The way I thought about s-risk was more like expected amount of suffering. But yeah I agree with you it’s not that bad and perhaps most expected suffering comes from more active utility-invert threats or values.
(Though tbc, I was totally imagining 1e40 humans being forced to press reward buttons.)