we do not know how to give a sufficiently precise definition (plenty of naive and not so naive attempts, but all have major flaws),
even if we had a definition, we have no way of building an AI that would actually follow it reliably (today’s AIs are not so much programmed, but rather discovered by an almost blind semi-random search, and while they are often “good enough”, they are never exactly right, and failure modes are fairly unpredictable)
And yes, even if we had the answers for the above two questions, we’d still need to make sure the code implementing them is programmed well, but compared to the above too issues, “programming well, once you are willing to spend 100x cost/LoC” is much closer to being a solved problem (using techniques such as formal verification).
There are two unresolved issues with alignment:
we do not know how to give a sufficiently precise definition (plenty of naive and not so naive attempts, but all have major flaws),
even if we had a definition, we have no way of building an AI that would actually follow it reliably (today’s AIs are not so much programmed, but rather discovered by an almost blind semi-random search, and while they are often “good enough”, they are never exactly right, and failure modes are fairly unpredictable)
And yes, even if we had the answers for the above two questions, we’d still need to make sure the code implementing them is programmed well, but compared to the above too issues, “programming well, once you are willing to spend 100x cost/LoC” is much closer to being a solved problem (using techniques such as formal verification).