Let me put it this way then, how do you combine all of these tiny little microexperiences into a coherent macroexperience?
Microexperiences are unphysical—there are no electrons, only global wavefunction. So you only have decomposition problem. It is solved by weak illusionism: there is no real fundamental perfect isolation of qualia, just qualia of isolation. For every detailed description of isolation of your qualia, there is either non-contradicting physical description of only approximately isolated part of reality, or your description is wrong—same way a description of a chair works.
Yes, but I have a principled reason to special plead here. The complete description of the world is only complete from the third person perspective. It’s incomplete from a first person perspective because we need to explain the phenomenal character of consciousness.
I think it circles here? You started by justifying incompleteness by inverted spectrum, received the objection about chairs being analogous, and then answer that the difference is in incompleteness. The problem is that the chair analogy is correct—the difference between blue and red is completely describable by physics. You only need intrinsic property of existence for the whole universe to solve zombies. But you also need it for a chair to be real.
Of course, I don’t think many physicalists actually believe in structural relations all the way down.
But visible misalignment being easy to detect and correlated with misaligned chain-of-thought doesn’t guarantee that training that eliminates visible misalignment and misaligned chain-of-thought results in a model that does things for the right reasons? The model can still learn unintended heuristics. And what’s the actual hypothesis about model’s reasons when they appear to be right? Its learned reasoning algorithm is isomorphic to a reasoning algorithm of a helpful human that reads same instructions, or what?