Richard_Ngo comments on Aligning to Virtues

Richard_Ngo 16 Feb 2026 17:19 UTC
2 points
0
Yeah, this was very vague, thanks for pointing that out. Have rewritten as follows:
Situational awareness becomes a feature not just a challenge. Today we try to test AI whether AIs will obey instructions in often-implausible hypothetical scenarios. But as AIs get more intelligent, trying to hide their actual situation from them will become harder and harder. However, the benefit of this is that we’ll be able to align them to values which require them to know about their situation. For example, following an instruction given by the president might be better (or worse) than following an instruction given by a typical person. And following an instruction given to many AIs might be better (or worse) than following an instruction that’s only given to one AI. Situationally aware AIs will by default know which case they’re in.
Deontological values don’t really account for such distinctions: you should follow deontology no matter who or where you are. Corrigibility does, but only in a limited way (e.g. distinguishing between authorized users and non-authorized users). Conversely, virtues and consequentialist values are approaches which allow AIs to apply their situational awareness to make flexible choices.