Assessing Kurzweil: the results

Pre­dic­tions of the fu­ture rely, to a much greater ex­tent than in most fields, on the per­sonal judge­ment of the ex­pert mak­ing them. Just one prob­lem—per­sonal ex­pert judge­ment gen­er­ally sucks, es­pe­cially when the ex­perts don’t re­ceive im­me­di­ate feed­back on their hits and misses. For­mal mod­els perform bet­ter than ex­perts, but when talk­ing about un­prece­dented fu­ture events such as nan­otech­nol­ogy or AI, the choice of the model is also de­pen­dent on ex­pert judge­ment.

Ray Kurzweil has a model of tech­nolog­i­cal in­tel­li­gence de­vel­op­ment where, broadly speak­ing, evolu­tion, pre-com­puter tech­nolog­i­cal de­vel­op­ment, post-com­puter tech­nolog­i­cal de­vel­op­ment and fu­ture AIs all fit into the same ex­po­nen­tial in­crease. When as­sess­ing the val­idity of that model, we could look at Kurzweil’s cre­den­tials, and maybe com­pare them with those of his crit­ics—but Kurzweil has given us some­thing even bet­ter than cre­den­tials, and that’s a track record. In var­i­ous books, he’s made pre­dic­tions about what would hap­pen in 2009, and we’re now in a po­si­tion to judge their ac­cu­racy. I haven’t been satis­fied by the var­i­ous ac­cu­racy rat­ings I’ve found on­line, so I de­cided to do my own as­sess­ments.

I first se­lected ten of Kurzweil’s pre­dic­tions at ran­dom, and gave my own es­ti­ma­tion of their ac­cu­racy. I found that five were to some ex­tent true, four were to some ex­tent false, and one was un­clas­sifi­able

But of course, rely­ing on a sin­gle as­ses­sor is un­re­li­able, es­pe­cially when some of the judge­ments are sub­jec­tive. So I started a call for vol­un­teers to get as­ses­sors. Mean­while Malo Bour­gon set up a sep­a­rate as­sess­ment on Youtopia, har­ness­ing the awe­some power of al­tru­ists chas­ing af­ter points.

The re­sults are now in, and they are fas­ci­nat­ing. They are...

Ooops, you thought you’d get the re­sults right away? No, be­fore that, as in an Os­car night, I first want to thank as­ses­sors William Naak­t­ge­boren, Eric Her­boso, Michael Dick­ens, Ben Ster­rett, Mao Shan, quinox, Olivia Schaefer, David Søn­stebø and one who wishes to re­main anony­mous. I also want to thank Malo, and Ethan Dick­in­son and all the other vol­un­teers from Youtopia (if you’re one of these, and want to be thanked by name, let me know and I’ll add you).

It was difficult de­cid­ing on the MVP—no ac­tu­ally it wasn’t, that ti­tle and many thanks go to Olivia Schaefer, who de­cided to as­sess ev­ery sin­gle one of Kurzweil’s pre­dic­tions, be­cause that’s just the sort of gal that she is.

The ex­act de­tails of the method­ol­ogy, and the raw data, can be ac­cessed through here. But in sum­mary, vol­un­teers were asked to as­sess the 172 pre­dic­tions (from the “Age of Spiritual Machines”) on a five point scale: 1=True, 2=Weakly True, 3=Can­not de­cide, 4=Weakly False, 5=False. If we to­tal up all the as­sess­ments made by my di­rect vol­un­teers, we have:

As can be seen, most as­sess­ments were rather em­phatic: fully 59% were ei­ther clearly true or false. Over­all, 46% of the as­sess­ments were false or weakly false, and and 42% were true or weakly true.

But what hap­pens if, in­stead of av­er­ag­ing across all as­sess­ments (which al­lows as­ses­sors who have worked on a lot of pre­dic­tions to dom­i­nate) we in­stead av­er­age across the nine as­ses­sors? Re­as­sur­ingly, this makes very lit­tle differ­ence:

What about the Youtopia vol­un­teers? Well, they have a de­cid­edly differ­ent pic­ture of Kurzweil’s ac­cu­racy:

This gives a com­bined true score of 30%, and com­bined false score of 57%! If my own per­sonal as­sess­ment was the most pos­i­tive to­wards Kurzweil’s pre­dic­tions, then Youtopia’s was the most nega­tive.

Put­ting this all to­gether, Kurzweil cer­tainly can’t claim an ac­cu­racy above 50% - a far cry from his own self as­sess­ment of ei­ther 102 out of 108 or 127 out of 147 cor­rect (with caveats that “even the pre­dic­tions that were con­sid­ered ‘wrong’ in this re­port were not all wrong”). And con­sis­tently, slightly more than 10% of his pre­dic­tions are judged “im­pos­si­ble to de­cide”.

As I’ve said be­fore, these were not bi­nary yes/​no pre­dic­tions—even a true rate of 30% is much higher that than chance. So Kurzweil re­mains an ac­cept­able prog­nos­ti­ca­tor, with very poor self-as­sess­ment.