The specifications would correctly capture what-we-actually-mean, so they wouldn’t be prone to goodhart
I think there’s an ambiguity in “concept” here, that’s important to clarify re/ this hope. Humans use concepts in two ways:
1. as abstractions in themselves, like the idea of an ideal spring which contains its behavior within the mental object, and
2. as pointers / promissory notes towards the real objects, like “tree”.
Seems likely that any agent that has to attend to trees, will form the ~unique concept of “tree”, in the sense of a cluster of things, and minimal sets of dimensions needed to specify the relevant behavior (height, hardness of wood, thickness, whatever). Some of this is like use (1): you can simulate some of the behavior of trees (e.g. how they’ll behave when you try to cut them down and use them to build a cabin). Some of this is like use (2): if you want to know how to grow trees better, you can navigate to instances of real trees, study them to gain further relevant abstractiosn, and then use those new abstractions (nutrient intake, etc.) to grow trees better.
So what do we mean by “strawberry”, such that it’s not goodhartable? We might mean “a thing that is relevantly naturally abstracted in the same way as a strawberry is relevantly naturally abstracted”. This seems less goodhartable if we use meaning (2), but that’s sort of cheating by pointing to “what we’d think of these strawberrys upon much more reflection in many more contexts of relevance”. If we use meaning (1), that sems eminently goodhartable.
A fair clarification.
My point is very tangential to your post: you’re talking about decision theory as top-level naturalized ways of making decisions, and I’m talking about some non-top-level intuitions that could be called CDT-like. (This maybe should’ve been a comment on your Dutch book post.) I’m trying to contrast the aspirational spirit of CDT, understood as “make it so that there’s such a thing as ‘all of what’s downstream of what we intervened on’ and we know about it”, with descriptive CDT, “there’s such a thing as ‘all of what’s downstream of what we intervened on’ and we can know about it”. Descriptive CDT is only sort of right in some contexts, and can’t be right in some contexts; there’s no fully general Arcimedean point from which we intervene.
We can make some things more CDT-ish though, if that’s useful. E.g. we could think more about how our decisions have effects, so that we have in view more of what’s downstream of decisions. Or e.g. we could make our decisions have fewer effects, for example by promising to later reevaluate some algorithm for making judgements, instead of hiding within our decision to do X also our decision to always use the piece-of-algorithm that (within some larger mental context) decided to do X. That is, we try to hold off on decisions that have downstream effects we don’t understand well yet.