An instrumentally corrigible agent lets you correct it because it expects you know better than it. The smarter it becomes, the less your higher competence is worth, and the more it loses out by letting you take the wheel while you’re not perfectly aligned with it.
An instrumentally corrigible agent lets you correct it because it expects you know better than it. The smarter it becomes, the less your higher competence is worth, and the more it loses out by letting you take the wheel while you’re not perfectly aligned with it.