I think the problem with human values is more underdetermination than fragility.
Base human preferences and meta-preferences are allowed to fuzzily span (interpersonally, and intrapersonally between slight variations in context and random noise, and intrapersonally between different ways of ascribing values to models of human behavior) a set of initial conditions that, when run forward, resolve internal conflicts differently and sometimes can end up in mutually-disagreeable end states. So it doesn’t make much sense to try to apply a basin of attraction argument directly to human values, because the target doesn’t stay in a basin.
Legitimacy is a bundle of meta-preferences—it’s about how we want human-modeling, resolution of internal conflicts, etc. to operate. It has no special exception—humans can be ascribed conflicting notions of legitimacy, that lead to different (and potentially mutually disagreeable) end states upon being used. Also bad at being a basin.
Mostly I think we just have to do our best—we have to apply the notions of legitimacy we have to the project of resolving the internal conflicts in our notion of legitimacy. We probably shouldn’t demand guarantees for a bunch of individual principles, because our principles can conflict. Seems better to gradually make choices that seem locally legitimate—maybe even choosing a method that we’re confident will converge rather than explode, even as we’re aware that slightly different versions of it would converge to different things.
I think the problem with human values is more underdetermination than fragility.
Base human preferences and meta-preferences are allowed to fuzzily span (interpersonally, and intrapersonally between slight variations in context and random noise, and intrapersonally between different ways of ascribing values to models of human behavior) a set of initial conditions that, when run forward, resolve internal conflicts differently and sometimes can end up in mutually-disagreeable end states. So it doesn’t make much sense to try to apply a basin of attraction argument directly to human values, because the target doesn’t stay in a basin.
Legitimacy is a bundle of meta-preferences—it’s about how we want human-modeling, resolution of internal conflicts, etc. to operate. It has no special exception—humans can be ascribed conflicting notions of legitimacy, that lead to different (and potentially mutually disagreeable) end states upon being used. Also bad at being a basin.
Mostly I think we just have to do our best—we have to apply the notions of legitimacy we have to the project of resolving the internal conflicts in our notion of legitimacy. We probably shouldn’t demand guarantees for a bunch of individual principles, because our principles can conflict. Seems better to gradually make choices that seem locally legitimate—maybe even choosing a method that we’re confident will converge rather than explode, even as we’re aware that slightly different versions of it would converge to different things.