A semi-autonomous AGI that does not have long-term preferences of its own but acts according to (its understanding of) the short-term preferences of some human or group of humans
In light of recent discussion, it seems like this part should be clarified to say “actual preferences” or “short-term preferences-on-reflection”.
Also in the table, for Corrigible Contender should the reliance on human safety be changed from “High” to “Medium”? (My feeling is that since the AI isn’t relying on the current humans’ elicited preferences, the reliance on human safety would be somewhere between that of Sovereign Singleton and Pivotal Tool.)
(I’m making these suggestions mainly because I expect people will continue to refer to this post in the future.)
In light of recent discussion, it seems like this part should be clarified to say “actual preferences” or “short-term preferences-on-reflection”.
Also in the table, for Corrigible Contender should the reliance on human safety be changed from “High” to “Medium”? (My feeling is that since the AI isn’t relying on the current humans’ elicited preferences, the reliance on human safety would be somewhere between that of Sovereign Singleton and Pivotal Tool.)
(I’m making these suggestions mainly because I expect people will continue to refer to this post in the future.)