in particular, an aligned AI sells more of its lightcone to get baby-eating aliens to eat their babies less, and in general a properly aligned AI will try its hardest to ensure what we care about (including reducing suffering) is satisfied, so alignment is convergent to both.
Those are some good properties, I think… Not quite sure in the end.
But your alignment procedure is indirect, so we don’t quite know today what the result will be, right? Then the question whether we’ll end up on an s-line depends on all the tons of complexity that usually comes with games with many participants. In this case the s-line results from the goals of another agent who is open to trade (hasn’t irrevocably committed). But there are many other paths to s-lines. (Am I using the -line nomenclature correctly? First time I heard about it. What are p-lines?)
(note that another reason i don’t think about S-risks too much is that i don’t think my mental health could handle worrying about them a lot, and i need all the mental health i can get to solve alignment.)
In my experience, the content of what one thinks about gets abstracted away at some point so that you cease to think about the suffering itself. Took about 5 years for me though… (2010–15)
yes, the eventual outcome is hard to predict. but by plan looks like the kind of plan that would fail in Xrisky rather than Srisky ways, when it fails.
i don’t use the Thing-line nomenclature very much anymore and i only use U/X/S.
i am concerned about the other paths as well but i’m hopeful we can figure them out within the QACI counterfactuals.
Yeah, very much agreed. :-/
Those are some good properties, I think… Not quite sure in the end.
But your alignment procedure is indirect, so we don’t quite know today what the result will be, right? Then the question whether we’ll end up on an s-line depends on all the tons of complexity that usually comes with games with many participants. In this case the s-line results from the goals of another agent who is open to trade (hasn’t irrevocably committed). But there are many other paths to s-lines. (Am I using the -line nomenclature correctly? First time I heard about it. What are p-lines?)
In my experience, the content of what one thinks about gets abstracted away at some point so that you cease to think about the suffering itself. Took about 5 years for me though… (2010–15)
yes, the eventual outcome is hard to predict. but by plan looks like the kind of plan that would fail in Xrisky rather than Srisky ways, when it fails.
i don’t use the Thing-line nomenclature very much anymore and i only use U/X/S.
i am concerned about the other paths as well but i’m hopeful we can figure them out within the QACI counterfactuals.