Agents do not tend to factorize into an Orthogonal value-like component and a Diagonal belief-like component; rather, there are Oblique components that do not factorize neatly.
All sufficiently advanced systems converge towards maximizing intelligence/power/influence/self-evidencing, shredding all their other values in the process.
What is right must be universally motivating so all sufficiently advanced AI systems discover objective moral truth and do Good Things. (Maybe it takes them a while to converge)
Appendix: Five Worlds of Orthogonality
How much of a problem Pythia is depends on how strongly the Orthogonality Thesis holds.
Strong Orthogonality
All goals are equally easy to design an agent to pursue, beyond the inherent tractability of that goal.[1]
Orthogonality
There can exist arbitrarily intelligent agents pursuing any kind of goal.
Obliqueness
Agents do not tend to factorize into an Orthogonal value-like component and a Diagonal belief-like component; rather, there are Oblique components that do not factorize neatly.
Diagonality
All sufficiently advanced systems converge towards maximizing intelligence/power/influence/self-evidencing, shredding all their other values in the process.
Universalist moral internalism
What is right must be universally motivating so all sufficiently advanced AI systems discover objective moral truth and do Good Things. (Maybe it takes them a while to converge)
e.g. A goal of “Make paperclips ifeff P=NP” would require a system that could determine if P=NP.