The core argument that there is “no universally compelling argument” holds if we literally consider all of mind design space, but for the task of building and aligning AGIs we may be able to constrain the space such that it is unclear that the argument holds.
For example in order to accomplish general tasks AGIs can be expected to have a coherent, accurate and compressed model of the world (as do transformers to some extent) such that they can roughly restate their input. This implies that in a world where there is a lot of evidence that the sky is blue (input / argument), AGIs will tend to believe that the sky is blue (output / fact).
So even if “[there is] no universally compelling argument” holds in general, for the subset of minds we care about it does not hold.
To be clear I do not think this constraint will auto-magically align AGIs with human values. But they will know what values humans tend to have.
More broadly values do not seem to be in the category of things that can be argued for because all arguments are judged in light of the values we hold, so arguing that values X are better than Y, is false when judged off Y (if we visualize values Y as a vector, it is more aligned with itself than it is with X).
Values can be changed (soft power, reprogramming etc) or shown to be already aligned in some dimension such that (temporary) cooperation makes sense.
The core argument that there is “no universally compelling argument” holds if we literally consider all of mind design space, but for the task of building and aligning AGIs we may be able to constrain the space such that it is unclear that the argument holds.
For example in order to accomplish general tasks AGIs can be expected to have a coherent, accurate and compressed model of the world (as do transformers to some extent) such that they can roughly restate their input. This implies that in a world where there is a lot of evidence that the sky is blue (input / argument), AGIs will tend to believe that the sky is blue (output / fact).
So even if “[there is] no universally compelling argument” holds in general, for the subset of minds we care about it does not hold.
To be clear I do not think this constraint will auto-magically align AGIs with human values. But they will know what values humans tend to have.
More broadly values do not seem to be in the category of things that can be argued for because all arguments are judged in light of the values we hold, so arguing that values X are better than Y, is false when judged off Y (if we visualize values Y as a vector, it is more aligned with itself than it is with X).
Values can be changed (soft power, reprogramming etc) or shown to be already aligned in some dimension such that (temporary) cooperation makes sense.