I think the distance from human values or complexity of values is not a crux, as web/books corpus overdetermines them in great detail (for corrigibility purposes). It’s mostly about alignment by default, whether human values in particular can be noticed in there, or if correctly specifying how to find them is much harder than finding some other deceptively human-value-shaped thing. If they can be found easily once there are tools to go looking for them at all, it doesn’t matter how complex they are or how important it is to get everything right, that happens by default.
But also there is this pervasive assumption of it being possible to formulate values in closed form, as tractable finite data, which occasionally fuels arguments. Like, value is said to be complex, but of finite complexity. In an open environment, this doesn’t need to be the case, a code/data distinction is only salient when we can make important conclusions by only looking at code and not at data. In an open environment, data is unbounded, can’t be demonstrated all at once. So it doesn’t make much sense to talk about complexity of values at all, without corrigibility alignment can’t work out anyway.
I think the distance from human values or complexity of values is not a crux, as web/books corpus overdetermines them in great detail (for corrigibility purposes). It’s mostly about alignment by default, whether human values in particular can be noticed in there, or if correctly specifying how to find them is much harder than finding some other deceptively human-value-shaped thing. If they can be found easily once there are tools to go looking for them at all, it doesn’t matter how complex they are or how important it is to get everything right, that happens by default.
But also there is this pervasive assumption of it being possible to formulate values in closed form, as tractable finite data, which occasionally fuels arguments. Like, value is said to be complex, but of finite complexity. In an open environment, this doesn’t need to be the case, a code/data distinction is only salient when we can make important conclusions by only looking at code and not at data. In an open environment, data is unbounded, can’t be demonstrated all at once. So it doesn’t make much sense to talk about complexity of values at all, without corrigibility alignment can’t work out anyway.