You’re correct, but since I define “aligned” as “tending to do what is actually best according to humanity’s value system”, and given that it would be harmful for them to take such a risk, a totally aligned AGI would not, in fact, take that risk lol. So although your addition is important to note, there’s a sense in which it is redundant.
Both direct and transitive alignment are valuable concepts. Especially with LLM AGIs, which I think are the only feasible directly aligned AGI we are likely to build, but which I suspect won’t be transitively aligned by default.
Since transitive alignment varies among humans (different humans have different inclinations towards building AGIs of uncertain alignment, given a capability to do that), it might be valuable to align LLM personalities to become people who are less likely to fail transitive alignment.
You’re correct, but since I define “aligned” as “tending to do what is actually best according to humanity’s value system”, and given that it would be harmful for them to take such a risk, a totally aligned AGI would not, in fact, take that risk lol. So although your addition is important to note, there’s a sense in which it is redundant.
Both direct and transitive alignment are valuable concepts. Especially with LLM AGIs, which I think are the only feasible directly aligned AGI we are likely to build, but which I suspect won’t be transitively aligned by default.
Since transitive alignment varies among humans (different humans have different inclinations towards building AGIs of uncertain alignment, given a capability to do that), it might be valuable to align LLM personalities to become people who are less likely to fail transitive alignment.