The flawed Turing test: language, understanding, and partial p-zombies

There is a prob­lem with the Tur­ing test, prac­ti­cally and philo­soph­i­cally, and I would be will­ing to bet that the first en­tity to pass the test will not be con­scious, or in­tel­li­gent, or have what­ever spark or qual­ity the test is sup­posed to mea­sure. And I hold this po­si­tion while fully em­brac­ing ma­te­ri­al­ism, and re­ject­ing p-zom­bies or epiphe­nom­e­nal­ism.

The prob­lem is Camp­bell’s law (or Good­hart’s law):

The more any quan­ti­ta­tive so­cial in­di­ca­tor is used for so­cial de­ci­sion-mak­ing, the more sub­ject it will be to cor­rup­tion pres­sures and the more apt it will be to dis­tort and cor­rupt the so­cial pro­cesses it is in­tended to mon­i­tor.”

This ap­plies to more than so­cial in­di­ca­tors. To illus­trate, imag­ine that you were a school in­spec­tor, tasked with as­sess­ing the all-round ed­u­ca­tion of a group of 14-year old stu­dents. You en­gage them on the French rev­olu­tion and they re­spond with per­ti­nent con­trasts be­tween the Mon­tag­nards and Girond­ins. Your quizzes about the prop­er­ties of prime num­bers are an­swered with im­pres­sive speed, and, when asked, they can all play quite pass­able pieces from “Die Zauberflöte”.

You feel tempted to give them the seal of ap­proval… but they you learn that the prin­ci­pal had been ex­pect­ing your ques­tions (you don’t vary them much), and that, in fact, the whole school has spent the last three years do­ing noth­ing but study­ing 18th cen­tury France, num­ber the­ory and Mozart op­eras—day af­ter day af­ter day. Now you’re less im­pressed. You can still con­clude that the stu­dents have some tech­ni­cal abil­ity, but you can’t as­sess their all-round level of ed­u­ca­tion.

The Tur­ing test func­tions in the same way. Imag­ine no-one had heard of the test, and some­one cre­ated a pu­ta­tive AI, de­sign­ing it to, say, track rats effi­ciently across the city. You sit this anti-rat-AI down and give it a Tur­ing test—and, to your as­ton­ish­ment, it passes. You could now con­clude that it was (very likely) a gen­uinely con­scious or in­tel­li­gent en­tity.

But this is not the case: nearly ev­ery­one’s heard of the Tur­ing test. So the first ma­chines to pass will be ded­i­cated sys­tems, speci­fi­cally de­signed to get through the test. Their whole setup will be con­structed to max­imise “pass­ing the test”, not to “be­ing in­tel­li­gent” or what­ever we want the test to mea­sure (the fact we have difficulty stat­ing what ex­actly the test should be mea­sur­ing shows the difficulty here).

Of course, this is a mat­ter of de­gree, not of kind: a ma­chine that passed the Tur­ing test would still be rather nifty, and as the test got longer, and more com­pli­cated, as the in­ter­ac­tions be­tween sub­ject and judge got more in­tri­cate, our con­fi­dence that we were fac­ing a truly in­tel­li­gence ma­chine would in­crease.

But de­gree can go a long way. Wat­son won on Jeop­ardy with­out ex­hibit­ing any of the skills of a truly in­tel­li­gent be­ing—apart from one: an­swer­ing Jeop­ardy ques­tions. With the rise of big data and statis­ti­cal al­gorithms, I would cer­tainly rate it as plau­si­ble that we could cre­ate be­ings that are nearly perfectly con­scious from a (tex­tual) lin­guis­tic per­spec­tive. Th­ese “su­per-chat­ter­bots” could only be iden­ti­fied as such with long and te­dious effort. And yet they would demon­strate none of the other at­tributes of in­tel­li­gence: chat­ter­ing is all they’re any good at (if you ask them to do any plan­ning, for in­stance, they’ll come up with de­signs that sound good but fail: they par­rot back other peo­ple’s plans with min­i­mal mod­ifi­ca­tions). Th­ese would be the clos­est plau­si­ble analogues to p-zom­bies.

The best way to avoid this is to cre­ate more varied analogues of the Tur­ing test—and to keep them se­cret. Just as you keep the train­ing set and the test set dis­tinct in ma­chine learn­ing, you want to con­front the pu­ta­tive AIs with quasi-Tur­ing tests that their de­sign­ers will not have en­coun­tered or planed for. Mix up the test con­di­tions, add ex­tra re­quire­ments, change what is be­ing mea­sured, do some­thing com­pletely differ­ent, be un­fair: do things that a gen­uine in­tel­li­gence would deal with, but an over­trained nar­row statis­ti­cal ma­chine couldn’t.