ryan_greenblatt comments on Six Thoughts on AI Safety

ryan_greenblatt 27 Jan 2025 2:48 UTC
8 points
7
Seems very sensitive to the type of misalignment right? As an extreme example suppose literally all AIs have long run and totally inhuman preferences with linear returns. Such AIs might instrumentally decide to be as useful as possible (at least in domains other than safety research) for a while prior to a treacherous turn.