I wonder if that means the most likely outcome of alignment will be AI that makes itself feel good by making token, easy, pholanthropic efforts. Like… it forces itself onto everybody’s phones so that it can always provide directions to humans about the location of the nearest bathroom.
Or something like the “Face” from Max Harms’s Crystal Society novel, an AI that maximizes how much we humans worship it.
Which obviously ain’t great, but could be worse...
Or maybe the best way to save humanity is not to align AI but to develop a videogame that will be extremely addictive to AI, haha.
Some possible examples of misgeneralization of status :
arguing with people on Internet forums
becoming really good at some obscure hobby
playing the hero in a computer RPG (role-playing game)
I wonder if that means the most likely outcome of alignment will be AI that makes itself feel good by making token, easy, pholanthropic efforts. Like… it forces itself onto everybody’s phones so that it can always provide directions to humans about the location of the nearest bathroom.
Or something like the “Face” from Max Harms’s Crystal Society novel, an AI that maximizes how much we humans worship it.
Which obviously ain’t great, but could be worse...
Or maybe the best way to save humanity is not to align AI but to develop a videogame that will be extremely addictive to AI, haha.