I think we live in a world where alignment is impossible. All attention based models in my opinion are complex enough systems to be computationally irreducable (There is no shorter way to know the outcome than to run the system itself, like with rule 110). If it is impossible to predict the outcome with certainty, the impossibility to force some desired outcome follows logically.
Humanity has not solved even the allignment of humans (children).
I think we live in a world where alignment is impossible. All attention based models in my opinion are complex enough systems to be computationally irreducable (There is no shorter way to know the outcome than to run the system itself, like with rule 110). If it is impossible to predict the outcome with certainty, the impossibility to force some desired outcome follows logically.
Humanity has not solved even the allignment of humans (children).
I think we’ve done an ok job at human alignment, given that the pension isn’t a bullet to the head.
I somewhat suspect that alignment is easier than most of less wrong think, but I’m definitely in the minority in this space.