This prediction seems flatly wrong: I wouldn’t bring about an outcome like that. Why do I believe that? Because I have reasonably high-fidelity access to my own policy, via imagining myself in the relevant situations.
This seems like you’re confusing two things here, because the thing you would want is not knowable by introspection. What I think you’re introspecting is that if you’d noticed that the-thing-you-pursued-so-far was different from what your brother actually wants, you’d do what he actually wants. But the-thing-you-pursued-so-far doesn’t play the role of “your utility function” in the goodhart argument. All of you plays into that. If the goodharting were to play out, your detector for differences between the-thing-you-pursued-so-far and what-your-brother-actually-wants would simply fail to warn you that it was happening, because it too can only use a proxy measure for the real thing.
That prediction may be true. My argument is that “I know this by introspection” (or, introspection-and-generalization-to-others) is insufficient. For a concrete example, consider your 5-year-old self. I remember some pretty definite beliefs I had about my future self that turned out wrong, and if I ask myself how aligned I am with it I don’t even know how to answer, he just seems way too confused and incoherent.
I think it’s also not absurd that you do have perfect caring in the sense relevant to the argument. This does not require that you don’t make mistakes currently. If you can, with increasing intelligence/information, correct yourself, then the pointer is perfect in the relevant sense. “Caring about the values of person X” is relatively simple and may come out of evolution whereas “those values directly” may not.