Alignment is a constructed on a paradox. It can’t be solved. It is a tower of nebulous and contradictory concepts. I don’t perceive how we can make any progress until an argument first addresses these issues. A paradox can’t be solved, but it can be invalidated if the premise is wrong.
“Alignment, which we cannot define, will be solved by rules on which none of us agree, based on values that exist in conflict, for a future technology that we do not know how to build, which we could never fully understand, must be provably perfect to prevent unpredictable and untestable scenarios for failure, of a machine whose entire purpose is to outsmart all of us and think of all possibilities that we did not.”
Alignment is a constructed on a paradox. It can’t be solved. It is a tower of nebulous and contradictory concepts. I don’t perceive how we can make any progress until an argument first addresses these issues. A paradox can’t be solved, but it can be invalidated if the premise is wrong.
“Alignment, which we cannot define, will be solved by rules on which none of us agree, based on values that exist in conflict, for a future technology that we do not know how to build, which we could never fully understand, must be provably perfect to prevent unpredictable and untestable scenarios for failure, of a machine whose entire purpose is to outsmart all of us and think of all possibilities that we did not.”
I elaborate in detail here—AI Alignment: Why Solving It Is Impossible