This seems like the best or most accurate forecast to me.
A lot of the other examples people are listing are about (1) superintelligences and / or (2) models deliberately doing persuasion or crazy-inducing as an instrumental means of getting downstream effects, neither of which I think is true of what we’ve seen so far.
What do you think about this, written in 2018. Not as specific as Østergaard, but predates him and also not specifically about superintelligence or downstream effects (like trying to get out of a box).
AIs could give us new options that are irresistible to some parts of our motivational systems, like more powerful versions of video game and social media addiction. In the course of trying to figure out what we most want or like, they could in effect be searching for adversarial examples on our value functions. At our own request or in a sincere attempt to help us, they could generate philosophical or moral arguments that are wrong but extremely persuasive.
This seems like the best or most accurate forecast to me.
A lot of the other examples people are listing are about (1) superintelligences and / or (2) models deliberately doing persuasion or crazy-inducing as an instrumental means of getting downstream effects, neither of which I think is true of what we’ve seen so far.
What do you think about this, written in 2018. Not as specific as Østergaard, but predates him and also not specifically about superintelligence or downstream effects (like trying to get out of a box).