John Danaher on ‘The Superintelligent Will’

Philoso­pher John Dana­her has writ­ten an ex­pli­ca­tion and cri­tique of Bostrom’s “or­thog­o­nal­ity the­sis” from “The Su­per­in­tel­li­gent Will.” To quote the con­clu­sion:

Sum­ming up, in this post I’ve con­sid­ered Bostrom’s dis­cus­sion of the or­thog­o­nal­ity the­sis. Ac­cord­ing to this the­sis, any level of in­tel­li­gence is, within cer­tain weak con­straints, com­pat­i­ble with any type of fi­nal goal. If true, the the­sis might provide sup­port for those who think it pos­si­ble to cre­ate a be­nign su­per­in­tel­li­gence. But, as I have pointed out, Bostrom’s defence of the or­thog­o­nal­ity the­sis is lack­ing in cer­tain re­spects, par­tic­u­larly in his some­what opaque and cav­a­lier dis­mis­sal of nor­ma­tively thick the­o­ries of ra­tio­nal­ity.

As it hap­pens, none of this may af­fect what Bostrom has to say about un­friendly su­per­in­tel­li­gences. His defence of that ar­gu­ment re­lies on the con­ver­gence the­sis, not the or­thog­o­nal­ity the­sis. If the or­thog­o­nal­ity the­sis turns out to be false, then all that hap­pens is that the kind of con­ver­gence Bostrom al­ludes to sim­ply oc­curs at a higher level in the AI’s goal ar­chi­tec­ture.

What might, how­ever, be sig­nifi­cant is whether the higher-level con­ver­gence is a con­ver­gence to­wards cer­tain moral be­liefs or a con­ver­gence to­ward nihilis­tic be­liefs. If it is the former, then friendli­ness might be ne­ces­si­tated, not sim­ply pos­si­ble. If it is the lat­ter, then all bets are off. A nihilis­tic agent could do pretty any­thing since, no goals would be ra­tio­nally en­tailed.