I don’t view orthogonality as essential to IABIED-style arguments.
Something can have a desirable attractor state and almost certainly fail to reach a sufficient peak of that attractor state if precision requirements are strong enoug, either due to entropy:
Most organisms have some level of mutational load
Most students score less than 100% on tests (even multiple-choice tests with no trick questions)
Most patrons of a bar will not score perfectly at darts
or path dependence:
Most species will not adopt a potentially valuable adaptation if it’s a break from their basic body plan
Individual’s religious affiliations are heavily influenced by what they grew up with
Or other factors.
Things come apart at the tails, and the more power and agency you have, the more consequential are the differences between what you’re optimizing for and (objective morality, human CEV, any particular other person’s interests, and so on.)
My best guess is that moral realism is true and thus represents an attractor state for minds that reflect on their own actions and preferences; even if you think this is false, aligning to human/company/government CEVs is an attactor state because human institutions are building the machines. But precision requirements may be high enough that catastrophic outcomes are almost certain, unless we become as relevantly good at alignment as (say) flight engineering.
I don’t view orthogonality as essential to IABIED-style arguments.
Something can have a desirable attractor state and almost certainly fail to reach a sufficient peak of that attractor state if precision requirements are strong enoug, either due to entropy:
Most organisms have some level of mutational load
Most students score less than 100% on tests (even multiple-choice tests with no trick questions)
Most patrons of a bar will not score perfectly at darts
or path dependence:
Most species will not adopt a potentially valuable adaptation if it’s a break from their basic body plan
Individual’s religious affiliations are heavily influenced by what they grew up with
Or other factors.
Things come apart at the tails, and the more power and agency you have, the more consequential are the differences between what you’re optimizing for and (objective morality, human CEV, any particular other person’s interests, and so on.)
My best guess is that moral realism is true and thus represents an attractor state for minds that reflect on their own actions and preferences; even if you think this is false, aligning to human/company/government CEVs is an attactor state because human institutions are building the machines. But precision requirements may be high enough that catastrophic outcomes are almost certain, unless we become as relevantly good at alignment as (say) flight engineering.