AI safety as featherless bipeds *with broad flat nails*

There’s a fa­mous story about Dio­genes and Plato:

[...] when Plato gave the tongue-in-cheek defi­ni­tion of man as “feather­less bipeds,” Dio­genes plucked a chicken and brought it into Plato’s Academy, say­ing, “Be­hold! I’ve brought you a man,” and so the Academy added “with broad flat nails” to the defi­ni­tion.

What Plato was (allegedly) do­ing was not pro­vid­ing a defi­ni­tion of man, but what I’d call a suffi­cient refer­ence or a suffi­cient poin­ter. If I’m in an­cient Athens and di­vide the ob­vi­ous ob­jects that I can see or think of into “feather­less bipeds” and “not feather­less bipeds”, then “man” will match up with the first cat­e­gory.

Then Dio­genes, act­ing like an AI, cre­ated some­thing that fell within the suffi­cient poin­ter class but that was clearly not a man. The Academy then amended the poin­ter to add “with broad flat nails”, patch­ing it till it was suffi­cient again. Had there been a pow­er­ful AI around, or a god, or a med­dling hu­man with enough means and per­sis­tence, then they could have pro­duced a “feather­less-biped-with-broad-flat-nails” that was also not a hu­man, mak­ing the poin­ter in­ad­e­quate again.

A lot of sug­ges­tions on AI safety are suffi­cient poin­t­ers. For ex­am­ple, take the idea that an AI should max­imise “com­plex­ity”. This comes, I be­lieve, from the fact that, in our cur­rent world, the cat­e­gory of “is com­plex” and “is valuable to hu­mans” match up a lot. It’s a suffi­cient poin­ter. But along comes a Dio­genes/​AI with com­plex­ity as a goal, and now it en­riches the set of ob­jects in the world with com­plex-but-worth­less things, break­ing the “defi­ni­tion”.

There­fore, a lot of things that peo­ple say they value or want AIs to pre­serve/​max­imise, should not be taken as say­ing that they value the spe­cific thing they say. In­stead, this should be taken as poin­ter to what they value in the cur­rent world, and the challenge is then to ex­tend that to new maps and new ter­ri­to­ries.