Thanks for this! My hot take: I’m worried about this methodology because it seems that the “test against philosophical intuitions” step might rule out (or steer you away from) definitions of goal-directedness that work fine for the AI risk arguments. I think I agree that philosophical intuitions have a role to play, but it’s more like a prior-constructing or search-guiding role than a test that needs to be passed or a constraint that needs to be met. And perhaps you agree with this also, in which case maybe we don’t disagree at all.
It might not be clear from the lit review, but I personally don’t agree with all the intuitions, or not completely. And I definitely believe that a definition that throw some part of the intuitions but applies to AI risks argument is totally fine. It’s more that I believe the gist of these intuitions is pointing in the right direction, and so I want to keep them in mind.
Thanks for this! My hot take: I’m worried about this methodology because it seems that the “test against philosophical intuitions” step might rule out (or steer you away from) definitions of goal-directedness that work fine for the AI risk arguments. I think I agree that philosophical intuitions have a role to play, but it’s more like a prior-constructing or search-guiding role than a test that needs to be passed or a constraint that needs to be met. And perhaps you agree with this also, in which case maybe we don’t disagree at all.
Yep, we seem to agree.
It might not be clear from the lit review, but I personally don’t agree with all the intuitions, or not completely. And I definitely believe that a definition that throw some part of the intuitions but applies to AI risks argument is totally fine. It’s more that I believe the gist of these intuitions is pointing in the right direction, and so I want to keep them in mind.