- I think the debate would probably benefit from better specification of what is meant by “misalignment” or “solving alignment” -- I do not think the convincing versions of gradual disempowerment either rely on misalignment or result power concentration among humans for relatively common meaning of alignment roughly at the level “does what the developer wants and approves, resolving conflicts between their wants in a way which is not egregiously bad”. If “aligned” means something at the level “implements coherent extrapolated volition of humanity” or “solves AI safety” than yes.
Just checking: Would you say that the AIs in you get what you measure and another (outer) alignment failure story are substantially less aligned than “does what the developer wants and approves, resolving conflicts between their wants in a way which is not egregiously bad”?
Just checking: Would you say that the AIs in you get what you measure and another (outer) alignment failure story are substantially less aligned than “does what the developer wants and approves, resolving conflicts between their wants in a way which is not egregiously bad”?