I think I’m imagining a kind of “business as usual” scenario where alignment appears to be solved using existing techniques (like RLHF) or straightforward extensions of these techniques, and where catastrophe is avoided but where AI fairly quickly comes to overwhelmingly dominate economically. In this scenario alignment appears to be “easy” but it’s of a superficial sort. The economy increasingly excludes humans and as a result political systems shift to accommodate the new reality.
This isn’t an argument for any new or different kind of alignment, I believe that alignment as you describe would prevent this kind of problem.
This is my opinion only, and I am thinking about this coming from a historical perspective so it’s possible that it isn’t a good argument. But I think it’s at least worth consideration as I don’t think the alignment problem is likely to be solved in time, but we may end up in a situation where AI systems that superficially appear aligned are widespread.
I think I’m imagining a kind of “business as usual” scenario where alignment appears to be solved using existing techniques (like RLHF) or straightforward extensions of these techniques, and where catastrophe is avoided but where AI fairly quickly comes to overwhelmingly dominate economically. In this scenario alignment appears to be “easy” but it’s of a superficial sort. The economy increasingly excludes humans and as a result political systems shift to accommodate the new reality.
This isn’t an argument for any new or different kind of alignment, I believe that alignment as you describe would prevent this kind of problem.
This is my opinion only, and I am thinking about this coming from a historical perspective so it’s possible that it isn’t a good argument. But I think it’s at least worth consideration as I don’t think the alignment problem is likely to be solved in time, but we may end up in a situation where AI systems that superficially appear aligned are widespread.