I think that humans are sorta “unaligned”, in the sense of being vulnerable to Goodhart’s Law.
A lot of moral philosophy is something like:
Gather our odd grab bag of heterogeneous, inconsistent moral intuitions
Try to find a coherent “theory” that encapsulates and generalizes these moral intuitions
Work through the consequences of the theory and modify it until you are willing to bite all the implied bullets.
The resulting ethical system often ends up having some super bizarre implications and usually requires specifying “free variables” that are (arguably) independent of our original moral intuitions.
In fact, I imagine that optimizing the universe according to my moral framework looks quite Goodhartian to many people.
Some examples:
I think that (a) personhood is preserved when moved to simulation (b) it’s easier to control a simulation. Therefore, it’d be ideal to upload as many people as possible. In fact, I’m not sure whether or not this should even be optional, given how inefficient the ratio of organic human atoms to “utilons” is.
I value future lives, so I think we have an ethical responsibility to create as many happy beings as we can, even at some cost to current beings.
I thinkthat some people are fundamentally capable of being happier than other people. So, all else equal, we should prefer to create happier people. I think that parents should be forced to adhere to this when having kids.
I think that we should modify all animals so we can guarantee that they have zero consciousness.
I think that people ought to do some limited amount of wire-heading (broadly increasing happiness independent of reality).
Complete self-determination/subjective “free-will” is both impossible and not desirable. SAI will be able to subtly, but meaningfully, guide humans down chosen paths because it can robustly predict the differential impact of seemingly minor conversational and environmental variations.
I’m sure there are many other examples.
I don’t think that my conclusions are wrong per se, but… my ethical system has some alien and potentially degenerate implications when optimized hard.
No real call to action here, just some observations. Existing human ethical systems might look as exotic to the average person as some conclusions drawn by a kinda-aligned SAI.
I think that humans are sorta “unaligned”, in the sense of being vulnerable to Goodhart’s Law.
A lot of moral philosophy is something like:
Gather our odd grab bag of heterogeneous, inconsistent moral intuitions
Try to find a coherent “theory” that encapsulates and generalizes these moral intuitions
Work through the consequences of the theory and modify it until you are willing to bite all the implied bullets.
The resulting ethical system often ends up having some super bizarre implications and usually requires specifying “free variables” that are (arguably) independent of our original moral intuitions.
In fact, I imagine that optimizing the universe according to my moral framework looks quite Goodhartian to many people.
Some examples:
I think that (a) personhood is preserved when moved to simulation (b) it’s easier to control a simulation. Therefore, it’d be ideal to upload as many people as possible. In fact, I’m not sure whether or not this should even be optional, given how inefficient the ratio of organic human atoms to “utilons” is.
I value future lives, so I think we have an ethical responsibility to create as many happy beings as we can, even at some cost to current beings.
I think that some people are fundamentally capable of being happier than other people. So, all else equal, we should prefer to create happier people. I think that parents should be forced to adhere to this when having kids.
I think that we should modify all animals so we can guarantee that they have zero consciousness.
I think that people ought to do some limited amount of wire-heading (broadly increasing happiness independent of reality).
Complete self-determination/subjective “free-will” is both impossible and not desirable. SAI will be able to subtly, but meaningfully, guide humans down chosen paths because it can robustly predict the differential impact of seemingly minor conversational and environmental variations.
I’m sure there are many other examples.
I don’t think that my conclusions are wrong per se, but… my ethical system has some alien and potentially degenerate implications when optimized hard.
No real call to action here, just some observations. Existing human ethical systems might look as exotic to the average person as some conclusions drawn by a kinda-aligned SAI.