I’m writing a post comparing some high-level approaches to AI alignment in terms of their false positive risk. Trouble is, there’s no standard agreement on what various high-level approaches to AI alignment there are today, either in terms of what constitutes these high-level approaches or where to draw the line in categorizing various specific approaches.
So, I’ll open it up as a question to get some feedback before I get too far along. What do you consider to be the high-level approaches to AI alignment?
(I’ll supply my own partial answer below.)