Right—the further issue being that for alignment, you have to understand minds, which are intrinsically to a significant extent holistic: studying some small fraction of a mind will always be leaving out the overarching mentality of the mind. Cf. https://tsvibt.github.io/theory/pages/bl_24_12_02_20_55_43_296908.html . E.g., several different mental elements have veto power over actions or self-modifications, and can take actions and do self-modifications. If you leave out several of these, your understanding of the overall dynamic is totally off-the-rails / totally divergent from what a mind actually does.
Right—the further issue being that for alignment, you have to understand minds, which are intrinsically to a significant extent holistic: studying some small fraction of a mind will always be leaving out the overarching mentality of the mind. Cf. https://tsvibt.github.io/theory/pages/bl_24_12_02_20_55_43_296908.html . E.g., several different mental elements have veto power over actions or self-modifications, and can take actions and do self-modifications. If you leave out several of these, your understanding of the overall dynamic is totally off-the-rails / totally divergent from what a mind actually does.
Cf. https://www.lesswrong.com/posts/Ht4JZtxngKwuQ7cDC/tsvibt-s-shortform?commentId=koeti9ygXB9wPLnnF