What are practical implication of alignment research in the world where AGI is hard?
Imagine we have a good alignment theory but do not have AGI. Can this theory be used to manipulate existing superintelligent systems such as science, deep state, stock market? Does alignment research have any results which can be practically used outside of AGI field right now?
Systems like the ones you mentioned aren’t single agents with utility functions we control—they’re made up of many humans whose utility functions we can’t control since we didn’t build them. This means alignment theory is not set up to align or manipulate these systems—it’s a very different problem.
There is alignment research that has been or is being performed on current-level AI, however—this is known as prosaic AI alignment. We also have some interpretability results that can be used in understanding more about modern, non-AGI AI’s. These results can be and have been used outside of AGI, but I’m not sure how practically useful they are right now—someone else might know more. If we had better alignment theory, at least some of it would be useful in aligning narrow AI as well.
What are practical implication of alignment research in the world where AGI is hard?
Imagine we have a good alignment theory but do not have AGI. Can this theory be used to manipulate existing superintelligent systems such as science, deep state, stock market? Does alignment research have any results which can be practically used outside of AGI field right now?
Systems like the ones you mentioned aren’t single agents with utility functions we control—they’re made up of many humans whose utility functions we can’t control since we didn’t build them. This means alignment theory is not set up to align or manipulate these systems—it’s a very different problem.
There is alignment research that has been or is being performed on current-level AI, however—this is known as prosaic AI alignment. We also have some interpretability results that can be used in understanding more about modern, non-AGI AI’s. These results can be and have been used outside of AGI, but I’m not sure how practically useful they are right now—someone else might know more. If we had better alignment theory, at least some of it would be useful in aligning narrow AI as well.