One thing I want people to know about AI alignment is that I wish people stopped referring to human values as though it was a singular object or list, and instead think closer to an individual human’s values, without presupposing that humanity must care commonly about certain values at all.
Minor example is I believe the felt complexity is somewhat overstated by lumping in multiple humans, though I don’t think it changes the broader argument around complexity of value.
A more major example is that it undermines the viability of ideas like CEV, and importantly makes aggregative/multi-alignment quite a lot harder than single-single alignment.
Agreed. I think this is why we should focus on working towards a society which emphases freedom and lack of physical assault (including weapons of mass destruction).
Within such a framework, it should be possible to have an archipelago of diverse societies with diverse rulesets.
If there were a ‘guardian AI’ system, I’d want it to mostly just keep us from either killing each other or overthrowing it, and leave the rest up to us. This moves away from the idea of an overbearing nanny, and towards a ‘let them choose their own path’ design.
“Human values” is a sort of objects. Humans can value, for example, forgiveness or revenge, these things are opposite, but both things have distinct quality that separate them from paperclips.
Yes, these values are all different from each other, but a crux is I don’t think that the differing values amongst humans are so distinct from paperclips that it’s worth it to blur the differences, especially with very strong optimization, though I agree that human values form a sort as in a set of objects, trivially.
I think the easy difference is that totally optimized according to someone’s values world is going to be either very good (even if not perfect) or very bad from perspective of another human? I wouldn’t say it’s impossible, but it should be very specific combination of human values to make it just as valuable as turning everything into paperclips, not worse, not better.
To my best (very uncertain) quess, human values are defined through some relation of states of consciousness to social dynamic?
I think the easy difference is that totally optimized according to someone’s values world is going to be either very good (even if not perfect) or very bad from perspective of another human? I wouldn’t say it’s impossible, but it should be very specific combination of human values to make it just as valuable as turning everything into paperclips, not worse, not better.
I mostly agree with this, with caveats that a paper-clip outcome can happen, but it isn’t very likely.
(For example, radical eco-green views where humans have to be extinct so nature can heal definitely exist, and would be a paper-clip outcome from my perspective).
I was also talking about very bad from the perspective of another human, since I think this is surprisingly important when dealing with AI safety.
I also like it for this reason, though I personally think that a lot of the challenge is in being capable enough to do it, rather than us not being able to make it destroy the world.
One thing I want people to know about AI alignment is that I wish people stopped referring to human values as though it was a singular object or list, and instead think closer to an individual human’s values, without presupposing that humanity must care commonly about certain values at all.
Minor example is I believe the felt complexity is somewhat overstated by lumping in multiple humans, though I don’t think it changes the broader argument around complexity of value.
A more major example is that it undermines the viability of ideas like CEV, and importantly makes aggregative/multi-alignment quite a lot harder than single-single alignment.
Agreed. I think this is why we should focus on working towards a society which emphases freedom and lack of physical assault (including weapons of mass destruction). Within such a framework, it should be possible to have an archipelago of diverse societies with diverse rulesets.
If there were a ‘guardian AI’ system, I’d want it to mostly just keep us from either killing each other or overthrowing it, and leave the rest up to us. This moves away from the idea of an overbearing nanny, and towards a ‘let them choose their own path’ design.
“Human values” is a sort of objects. Humans can value, for example, forgiveness or revenge, these things are opposite, but both things have distinct quality that separate them from paperclips.
Yes, these values are all different from each other, but a crux is I don’t think that the differing values amongst humans are so distinct from paperclips that it’s worth it to blur the differences, especially with very strong optimization, though I agree that human values form a sort as in a set of objects, trivially.
I think the easy difference is that totally optimized according to someone’s values world is going to be either very good (even if not perfect) or very bad from perspective of another human? I wouldn’t say it’s impossible, but it should be very specific combination of human values to make it just as valuable as turning everything into paperclips, not worse, not better.
To my best (very uncertain) quess, human values are defined through some relation of states of consciousness to social dynamic?
I mostly agree with this, with caveats that a paper-clip outcome can happen, but it isn’t very likely.
(For example, radical eco-green views where humans have to be extinct so nature can heal definitely exist, and would be a paper-clip outcome from my perspective).
I was also talking about very bad from the perspective of another human, since I think this is surprisingly important when dealing with AI safety.
I like Yudkowsky’s toy example of tasking an AGI to copy a single strawberry, on a molecular level, without destroying the world as a side-effect.
I also like it for this reason, though I personally think that a lot of the challenge is in being capable enough to do it, rather than us not being able to make it destroy the world.
Still, I kind of like the toy example.