Agreed that orthogonality thesis only implies that many points in the alignment/capability graph can exist, but that’s because the thesis assumes it’s a meaningful distinction. Introspectively, it feels like a reasonable difference to wonder about—it applies to me and to other humans I talk to about such topics.
I don’t think there’s any way to determine, strictly from behaviors/outputs, what adjustments would be needed to make an agent do what you want. Especially given an agent that is more competent (at some things) than you, I don’t think I can figure out why they’re less effective (or negatively effective) on other things.
Agreed that orthogonality thesis only implies that many points in the alignment/capability graph can exist, but that’s because the thesis assumes it’s a meaningful distinction. Introspectively, it feels like a reasonable difference to wonder about—it applies to me and to other humans I talk to about such topics.
I don’t think there’s any way to determine, strictly from behaviors/outputs, what adjustments would be needed to make an agent do what you want. Especially given an agent that is more competent (at some things) than you, I don’t think I can figure out why they’re less effective (or negatively effective) on other things.