I think proper guide for alignment researcher here is to:
Understand other people as made-of-gears cognitive engines, i.e., instead of “they don’t bother to apply effort for some reason” “they don’t bother to apply effort because they learned in the course of their life that extra effort is not rewarded”, or something like that. You don’t even need to build comprehensive model, you just can list more than two hypotheses about possible gears and not assume “no gears, just howling abyss”.
Realize that it would require supernatual intervention for them to have your priorities and approaches.
I think proper guide for alignment researcher here is to:
Understand other people as made-of-gears cognitive engines, i.e., instead of “they don’t bother to apply effort for some reason” “they don’t bother to apply effort because they learned in the course of their life that extra effort is not rewarded”, or something like that. You don’t even need to build comprehensive model, you just can list more than two hypotheses about possible gears and not assume “no gears, just howling abyss”.
Realize that it would require supernatual intervention for them to have your priorities and approaches.