Despite not answering all possible goal-related questions a priori, the reductionist perspective does provide a tractable research program for improving our understanding of AI goal development. It does this by reducing questions about goals to questions about behaviors observable in the training data.
[emphasis mine]
This might be described as “a reductionist perspective”. It is certainly not “the reductionist perspective”, since reductionist perspectives need not limit themselves to “behaviors observable in the training data”.
A more reasonable-to-my-mind behavioral reductionist perspective might look like this.
Ruling out goal realism as a good way to think does not leave us with [the particular type of reductionist perspective you’re highlighting].
In practice, I think the reductionist perspective you point at is:
Useful, insofar as it answers some significant questions.
Highly misleading if we ever forget that [this perspective doesn’t show us that x is a problem] doesn’t tell us [x is not a problem].
Relevant here is Geoffrey Irving’s AXRP podcast appearance. (if anyone already linked this, I missed it)
I think Daniel Filan does a good job there both in clarifying debate and in questioning its utility (or at least the role of debate-as-solution-to-fundamental-alignment-subproblems). I don’t specifically remember satisfying answers to your (1)/(2)/(3), but figured it’s worth pointing at regardless.