Yup, this all seems basically right. Though in reality I’m not that worried about the “we might outlaw some good actions” half of the dilemma. In real-world settings, actions are so multi-faceted that being able to outlaw a class of actions based on any simple property would be a research triumph.
Yes, I too am more concerned from a ‘maybe this framing isn’t super useful as it fails to capture important distinctions between corrigible and non-corrigible’ point of view rather than a ‘we might outlaw some good actions’ point of view.
Yup, this all seems basically right. Though in reality I’m not that worried about the “we might outlaw some good actions” half of the dilemma. In real-world settings, actions are so multi-faceted that being able to outlaw a class of actions based on any simple property would be a research triumph.
Also see https://www.lesswrong.com/posts/LR8yhJCBffky8X3Az/using-predictors-in-corrigible-systems or https://www.lesswrong.com/posts/qpZTWb2wvgSt5WQ4H/defining-myopia for successor lines of reasoning.
Yes, I too am more concerned from a ‘maybe this framing isn’t super useful as it fails to capture important distinctions between corrigible and non-corrigible’ point of view rather than a ‘we might outlaw some good actions’ point of view.
Thanks for the links, they look interesting!