ryan_greenblatt comments on The Shutdown Problem: Incomplete Preferences as a Solution

ryan_greenblatt 10 Apr 2024 18:52 UTC
2 points
0

I think POST is a simple and natural rule for AIs to learn. Any kind of capable agent will have some way of comparing outcomes, and one feature of outcomes that capable agents will represent is ‘time that I remain operational’.

Do you think selectively breeding humans for this would result in this rule generalizing? (You can tell them that they should follow this rule if you want. But, if you do this, you should also consider if “telling them should be obedient and then breeding for this” would also work.)

Do you think it’s natural to generalize to extremely unlikely conditionals that you’ve literally never been trained on (because they are sufficiently unlikely that they would never happen)?