In other words, a lot of researchers work on stuff that is sort-of known to not be able to address the hard problems.
The way I think about is: It’s often considered a good idea to study simplified toy versions of various systems, on the reasoning that deriving results there would be easier, and that afterwards you’d be able to combine these disjoint toy models separately explaining every feature of the system into one model holistically explaining every feature.
But what sometimes happens is the opposite. Sometimes studying toy setups, with load-bearing building blocks missing, results in overly complicated models that only confuse you. On the other hand, adding the building blocks back – making the initial setup more complex – actually leads to the desired features trivially falling out of it.
Right—the further issue being that for alignment, you have to understand minds, which are intrinsically to a significant extent holistic: studying some small fraction of a mind will always be leaving out the overarching mentality of the mind. Cf. https://tsvibt.github.io/theory/pages/bl_24_12_02_20_55_43_296908.html . E.g., several different mental elements have veto power over actions or self-modifications, and can take actions and do self-modifications. If you leave out several of these, your understanding of the overall dynamic is totally off-the-rails / totally divergent from what a mind actually does.
The way I think about is: It’s often considered a good idea to study simplified toy versions of various systems, on the reasoning that deriving results there would be easier, and that afterwards you’d be able to combine these disjoint toy models separately explaining every feature of the system into one model holistically explaining every feature.
But what sometimes happens is the opposite. Sometimes studying toy setups, with load-bearing building blocks missing, results in overly complicated models that only confuse you. On the other hand, adding the building blocks back – making the initial setup more complex – actually leads to the desired features trivially falling out of it.
It’s a tricky balance to maintain.
Right—the further issue being that for alignment, you have to understand minds, which are intrinsically to a significant extent holistic: studying some small fraction of a mind will always be leaving out the overarching mentality of the mind. Cf. https://tsvibt.github.io/theory/pages/bl_24_12_02_20_55_43_296908.html . E.g., several different mental elements have veto power over actions or self-modifications, and can take actions and do self-modifications. If you leave out several of these, your understanding of the overall dynamic is totally off-the-rails / totally divergent from what a mind actually does.
Cf. https://www.lesswrong.com/posts/Ht4JZtxngKwuQ7cDC/tsvibt-s-shortform?commentId=koeti9ygXB9wPLnnF