Kurzweil’s predictions: good accuracy, poor self-calibration

Pre­dic­tions of the fu­ture rely, to a much greater ex­tent than in most fields, on the per­sonal judge­ment of the ex­pert mak­ing them. Just one prob­lem—per­sonal ex­pert judge­ment gen­er­ally sucks, es­pe­cially when the ex­perts don’t re­ceive im­me­di­ate feed­back on their hits and misses. For­mal mod­els perform bet­ter than ex­perts, but when talk­ing about un­prece­dented fu­ture events such as nan­otech­nol­ogy or AI, the choice of the model is also de­pen­dent on ex­pert judge­ment.

Ray Kurzweil has a model of tech­nolog­i­cal in­tel­li­gence de­vel­op­ment where, broadly speak­ing, evolu­tion, pre-com­puter tech­nolog­i­cal de­vel­op­ment, post-com­puter tech­nolog­i­cal de­vel­op­ment and fu­ture AIs all fit into the same ex­po­nen­tial in­crease. When as­sess­ing the val­idity of that model, we could look at Kurzweil’s cre­den­tials, and maybe com­pare them with those of his crit­ics—but Kurzweil has given us some­thing even bet­ter than cre­den­tials, and that’s a track record. In var­i­ous books, he’s made pre­dic­tions about what would hap­pen in 2009, and we’re now in a po­si­tion to judge their ac­cu­racy. I haven’t been satis­fied by the var­i­ous ac­cu­racy rat­ings I’ve found on­line, so I de­cided to do my own.

Some have ar­gued that we should pe­nal­ise pre­dic­tions that “lack origi­nal­ity” or were “an­ti­ci­pated by many sources”. But hind­sight bias means that we cer­tainly judge many profoundly rev­olu­tion­ary past ideas as “un­o­rigi­nal”, sim­ply be­cause they are ob­vi­ous to­day. And say­ing that other sources an­ti­ci­pated the ideas is worth­less un­less we can quan­tify how main­stream and be­liev­able those sources were. For these rea­sons, I’ll fo­cus only on the ac­cu­racy of the pre­dic­tions, and make no judge­ment as to their ease or difficulty (un­less they say things that were already true when the pre­dic­tion was made).

Con­versely, I won’t be giv­ing any credit for “near misses”: this has the hind­sight prob­lem in the other di­rec­tion, where we fit po­ten­tially am­bigu­ous pre­dic­tions to what we know hap­pened. I’ll be strict about the mean­ing of the pre­dic­tion, as writ­ten. A pre­dic­tion in a pub­lished book is a form of com­mu­ni­ca­tion, so if Kurzweil ac­tu­ally meant some­thing differ­ent to what was writ­ten, then the fault is en­tirely his for not spel­ling it out un­am­bigu­ously.

One ex­cep­tion to that strict­ness: I’ll be tol­er­ant on the timeline, as I feel that a lot of the pre­dic­tions were forced into a “ten years from 1999” for­mat. So I’ll es­ti­mate the pre­dic­tion ac­cu­rate if it hap­pened at any point up to the end of 2011, if data is available.

The num­ber of pre­dic­tions ac­tu­ally made seem to vary from source to source; I used my copy of “The Age of Spiritual Machines”, which seems to be the origi­nal 1999 edi­tion. In the chap­ter “2009″, I counted 63 pre­dic­tion para­graphs. I then chose ten num­bers at ran­dom be­tween 1 and 63, and analysed those ten pre­dic­tions for cor­rect­ness (those want­ing to skip di­rectly to the fi­nal score can scroll down). See­ing Kurzweil’s na­tion­al­ity and lo­ca­tion, I will as­sume all pre­dic­tion re­fer only to tech­nolog­i­cally ad­vanced na­tions, and speci­fi­cally to the United States if there is any doubt. Please feel free to com­ment on my judge­ments be­low; we may be able to build a Less Wrong con­sen­sus ver­dict. It would be best if you tried to reach your own con­clu­sions be­fore read­ing my ver­dict or any­one else’s. Hence I pre­sent the ten pre­dic­tions, ini­tially with­out com­men­tary:

  • Pre­dic­tion 5: Cables are dis­ap­pear­ing. Com­mu­ni­ca­tion be­tween com­po­nents, such as point­ing de­vices, micro­phones, dis­plays, print­ers and the oc­ca­sional key­board, uses short-dis­tance wire­less tech­nol­ogy.

  • Pre­dic­tion 7: The ma­jor­ity of text is cre­ated us­ing con­tin­u­ous speech recog­ni­tion (CSR) dic­ta­tion soft­ware, but key­boards are still used. CSR is very ac­cu­rate, far more so than the hu­man tran­scrip­tion­ists who were used up un­til a few years ago.

  • Pre­dic­tion 8: Also ubiquitous are lan­guage user in­ter­faces (LUIs) which com­bine CSR and nat­u­ral lan­guage recog­ni­tion. For rou­tine mat­ters, such as sim­ple busi­ness trans­ac­tions and in­for­ma­tion in­quiries, LUIs are quite re­spon­sive and pre­cise. They tend to be nar­rowly fo­cused, how­ever, on spe­cific types of tasks. LUIs are fre­quently com­bined with an­i­mated per­son­al­ities. In­ter­act­ing with an an­i­mated per­son­al­ity to con­duct a pur­chase or make a reser­va­tion is like talk­ing to a per­son us­ing video con­ferenc­ing, ex­cept the per­son is simu­lated.

  • Pre­dic­tion 18: In the twen­tieth cen­tury, com­put­ers in schools were mostly on the trailing edge, with most effec­tive learn­ing from com­put­ers tak­ing place in the home. Now in 2009, while schools are still not on the cut­ting edge, the profound im­por­tance of the com­puter as a knowl­edge tool is widely recog­nised. Com­put­ers play a cen­tral role in all facets of ed­u­ca­tion, as they do in other spheres of life.

  • Pre­dic­tion 20: Stu­dents of all ages typ­i­cally have a com­puter of their own, which is a thin tabletlike de­vice weigh­ing un­der a pound with a very high re­s­olu­tion dis­play suit­able for read­ing. Stu­dents in­ter­act with their com­put­ers pri­mar­ily by voice and by point­ing with a de­vice that looks like a pen­cil. Key­boards still ex­ist, but most tex­tual lan­guage is cre­ated by speak­ing. Learn­ing ma­te­ri­als are ac­cessed through wire­less com­mu­ni­ca­tion.

  • Pre­dic­tion 26: Print-to-speech read­ing de­vices for the blind are now very small, in­ex­pen­sive, palm-sized de­vices that can read books (those that still ex­ist in pa­per form) and other printed doc­u­ments, and other real-world text such as signs and dis­plays. Th­ese read­ing sys­tems are equally adept at read­ing the trillions of elec­tronic doc­u­ments that are in­stantly available from the ubiquitous wire­less wor­ld­wide net­work.

  • Pre­dic­tion 29: Com­puter-con­trol­led or­thotic de­vices have been in­tro­duced. Th­ese “walk­ing ma­chines” en­able para­plegics to walk and climb stairs. The pros­thetic de­vices are not yet us­able by all para­plegic per­sons, as many phys­i­cally dis­abled per­sons have dys­func­tional joints from years of di­suse. How­ever, the ad­vent of or­thotic walk­ing sys­tems is pro­vid­ing more mo­ti­va­tions to have these joints re­placed.

  • Pre­dic­tion 44: In­tel­li­gent roads are in use, mainly for long-dis­tance travel. Once your car’s guidance sys­tem locks into the con­trol sen­sors on one these high­ways, you can sit back and re­lax. Lo­cal roads, though, are still pre­dom­i­nantly con­ven­tional.

  • Pre­dic­tion 48: There is con­tin­u­ing con­cern with an un­der­class that the skill lad­der has left far be­hind. The size of the un­der­class ap­pears to be sta­ble, how­ever. Although not poli­ti­cally pop­u­lar, the un­der­class is poli­ti­cally neu­tral­ised through pub­lic as­sis­tance and the gen­er­ally high level of af­fluence.

  • Pre­dic­tion 53: Beyond mu­si­cal record­ings, images, and movie videos, the most pop­u­lar type of digi­tal en­ter­tain­ment ob­ject is vir­tual ex­pe­rience soft­ware. Th­ese in­ter­ac­tive vir­tual en­vi­ron­ments al­low you to go white­wa­ter raft­ing on vir­tual rivers, to hang-glide in a vir­tual Grand Canyon, or to en­gage in in­ti­mate en­coun­ters with your favourite movie star. Users also ex­pe­rience fan­tasy en­vi­ron­ments with no coun­ter­part in the phys­i­cal world. The vi­sual and au­di­tory ex­pe­rience of vir­tual re­al­ity is com­pel­ling, but tac­tile in­ter­ac­tion is still limited.

Verdict

    My scale for judg­ing the pre­dic­tions is: true, weakly true, weakly false, false.

    Pre­dic­tion 5: My office and the com­puter I’m typ­ing on seem pretty full of ca­bles. Nev­er­the­less, it is true there has been a rise in wire­less tech­nol­ogy, and wire­less com­puter com­po­nents, even if they’re not ubiquitous. I’ll grade this as a weakly true.

    Pre­dic­tion 7: I have failed to find proper data for the first pre­dic­tion. Anec­do­tally, it cer­tainly seems false—key­boards are still in ubiquitous use, and I’ve never per­son­ally seen any­one use voice recog­ni­tion to write doc­u­ments of any length or even to send texts (a few per­sonal ex­per­i­ments with Siri notwith­stand­ing). The sec­ond claim in false: ac­cord­ing to an as­sess­ment by the Na­tional In­sti­tute of Stan­dards and Tech­nol­ogy, the ac­cu­racy of CSR is still nowhere near sur­pass­ing hu­man tran­scrip­tion. This leads ex­tra cre­dence to the first claim be­ing false as well: with­out the diminished er­ror rate, it’s very hard to see CSR be­ing used for the ma­jor­ity of text cre­ation. False.

    Pre­dic­tion 8: Apart from the be­lief that the an­i­mated per­son­al­ity would be vi­sual, this is a near-perfect de­scrip­tion of Siri and similar as­sis­tants. The term “ubiquitous” is tricky, but if we in­ter­pret it to mean “to be found ev­ery­where” (rather than “ev­ery­one has one”), then the pre­dic­tion is weakly true (knocked down from true be­cause of the un­cer­tainty about ubiquity).

    Pre­dic­tion 18: Without need­ing to do the re­search, I think we can take this claim as ev­i­dently true.

    Pre­dic­tion 20: All the stuff about voice recog­ni­tion is false. The only de­vice that fits that de­scrip­tion to­day is the smart­phone, which has not achieved pen­e­tra­tion of more than 50% among teenagers in 2011 (teenagers are the me­dian “stu­dents of all ages”; adding in uni­ver­sity stu­dents as well as pre-teens should lower the pro­por­tion, not raise it). “Learn­ing ma­te­ri­als are ac­cessed through wire­less com­mu­ni­ca­tion” is hard to in­ter­pret, as it doesn’t give any es­ti­mate to what pro­por­tion of learn­ing ma­te­rial we are talk­ing about. So though we can give Kurzweil ku­dos for imag­in­ing some­thing like the smart­phone, the pre­dic­tion is weakly false.

    Pre­dic­tion 26: One can quib­ble about in­ex­pen­sive, as the prod­ucts seem to be in the $600 range, but those prod­ucts cer­tainly ex­ist for book and mag­a­z­ine read­ing (though not for most signs and dis­plays, as far as I can tell—cer­tainly not in a form the blind can use). The sec­ond sen­tence is true for some screen read­ers, mak­ing the pre­dic­tion es­sen­tially true.

    Pre­dic­tion 29: 2009 timeline wrong, but true in later years.

    Pre­dic­tion 44: The rel­a­tive quan­tifier in the last sen­tence (“though, are still pre­dom­i­nantly con­ven­tional”) makes it clear that we should ex­pect in­tel­li­gent high­ways to be com­mon among long-dis­tance high­ways—this isn’t a few ex­per­i­men­tal roads we’re talk­ing about. Though we have a few self-driv­ing cars, we have noth­ing like the in­tel­li­gent roads im­plied in this pre­dic­tion, which speci­fi­cally im­plies that most cars on those roads will be self-driven. False.

    Pre­dic­tion 48: The first part of the pre­dic­tion is true. The sec­ond sen­tence seems false, whether one mea­sures the un­der­class through rel­a­tive in­come (where in­equal­ity has been in­creas­ing) or through an ab­solute stan­dard of ed­u­ca­tional at­tain­ment (where the var­i­ous grad­u­at­ing rates have gone up, im­ply­ing the un­der­class is de­creas­ing). There are other ways one could mea­sure the un­der­class, giv­ing differ­ent re­sults. Since one could read the un­der­class as in­creas­ing or de­creas­ing, should we take Kurzweil’s claim that it is sta­ble as the cor­rect mean? No. All that means is that had he spelt out his claim in more de­tail at the time, it would likely have ended up false. Am­bi­guity does not make a false state­ment true. The last sen­tence is vir­tu­ally im­pos­si­ble to con­firm or in­firm, so the whole pre­dic­tion is weakly true and weakly false.

    Pre­dic­tion 53: This is a tricky one. The Wii and similar game con­soles seem to fit the bill to some ex­tent. How­ever the tone sug­gests he is talk­ing about a vir­tual re­al­ity ex­pe­rience, which is not what we cur­rently have. So, does he mean vir­tual re­al­ity, or does he mean “games like what they had in 1999, ex­cept with much bet­ter graph­ics and fea­tures”? How would some­one at the time have read the pre­dic­tion? Again, am­bi­guity can­not be used to make a false state­ment true. I’m go­ing to work on the as­sump­tion that had he merely meant “graph­ics and fea­tures of video games will im­prove a lot”, he would have said so (cer­tainly his pre­dic­tion seems to promise much more than that). So the pre­dic­tion is false.

    But what if he was talk­ing about mod­ern games? For a start, his ini­tial sen­tence gets the rel­a­tive size of the in­dus­tries wrong (though that can be read as a throw-away state­ment rather than a pre­dic­tion). He also doesn’t con­sider things like Face­book games, which make up a large part of the games in­dus­try, and are cer­tainly not in­ter­ac­tive vir­tual en­vi­ron­ments. What about “these vir­tual en­vi­ron­ments al­low...”? Well, the state­ment is pos­si­bly an ut­ter triv­ial­ity, claiming that games ex­ist which fea­ture raft­ing, hang-glid­ing or erotic situ­a­tions (that was already true in 1999). Or it claims that fea­tures like these are a ma­jor com­po­nent of the most most pop­u­lar games to­day, which is false (now, if he’d said “blow­ing things up with a mar­vel­lous amount of weapons...”). Fan­tasy en­vi­ron­ment is a much more com­mon fea­ture, so, I’m tak­ing that as cor­rect. Un­der this in­ter­pre­ta­tion, the pre­dic­tion is weakly true and weakly false for games. In to­tal, read­ing the state­ment ei­ther way, I’ll clas­sify it as (con­tentiously) weakly false.

    Note: I did read Kurzweil’s as­sess­ment of his own pre­dic­tions, af­ter I had con­ducted my own anal­y­sis. In that as­sess­ment, nearly ev­ery am­bigu­ous clause is in­ter­preted in Kurzweil’s favour. This could be Kurzweil twist­ing the pre­dic­tions in his di­rec­tion; it could be a blatant ex­am­ple of hind­sight bias; or it could be that what Kurzweil meant to say was differ­ent from what he wrote. Un­for­tu­nately, there is no way for us to tell, so we must make do with what was writ­ten and in­ter­pret it as best we can.

    Analysis

      So, out of the ten pre­dic­tions, five are to some ex­tent true, four are to some ex­tent false, and one is un­clas­sifi­able (read­ing through the rest of the pre­dic­tions, com­pletely in­for­mally, these pro­por­tions seem roughly cor­rect).

      Now imag­ine Kurzweil as a pre­dic­tor who gives pre­dic­tions, each with in­de­pen­dent prob­a­bil­ity p of bring true (al­ter­nately, as­sume that a fixed pro­por­tion p of the 63 pre­dic­tions are true, and pre­tend 63 is high enough that we can treat p as con­tin­u­ous with­out much loss). If we start with a uniform prior on p be­tween 0 and 1, then we can up­date given this data. Model pre­dic­tion 48 as true or false with equal prob­a­bil­ity. Then the pos­te­rior must be pro­por­tional to (1-p)5p5 + (1-p)4p6:

      This has a mean above 54%, which I’d say is ex­cel­lent. A pre­dic­tion record over 50% for a decade that in­cluded huge in­creases in com­puter power, Septem­ber 11th and the great re­ces­sion is in­tu­itively a very good one. Alas there is no cen­tral repos­i­tory of pre­dic­tion records from var­i­ous fu­tur­ists, but in the ab­sence of that, his track record cer­tainly feels im­pres­sive. Don’t let the hind­sight bias blind you to how hard this was, and don’t sim­ply think of ev­ery pre­dic­tion as bi­nary: gen­er­ally, there are far more ways for a pre­dic­tion to be false than there are for them to be true.

      On the other hand, if we look at Kurzweil’s own rank­ing of the pre­dic­tions he gave in the “Age of Spiritual Machines”, he grades him­self as hav­ing ei­ther 102 out of 108 or 127 out of 147 cor­rect (with caveats that “even the pre­dic­tions that were con­sid­ered ‘wrong’ in this re­port were not all wrong”). I’ve plot­ted the lower 127/​147≈0.86 ac­cu­racy on the above graph; that is very far from be­ing a mean es­ti­mate (it’s in the 99th per­centile of the prob­a­bil­ity dis­tri­bu­tion). But let’s give Kurzweil all we can: we’ll re­clas­sify the ar­guable pre­dic­tion 53 as be­ing true (pos­te­rior pro­por­tional to (1-p)4p6 + (1-p)3p7):

      That is still not enough to make his ac­cu­racy es­ti­mate rea­son­able: his es­ti­mate is in the 96th per­centile of the prob­a­bil­ity dis­tri­bu­tion. Let’s be even more gen­er­ous: let’s re­clas­sify the in­ter­me­di­ate pre­dic­tion 48 as also be­ing true (pos­te­rior pro­por­tional to (1-p)3p7):

      Those were very gen­er­ous ad­just­ments; chang­ing two re­sults is a lot from a sam­ple of ten. But even with the most gen­er­ous ad­just­ments and tak­ing Kurzweil’s low­est es­ti­mate of his own ac­cu­racy, he is still ex­traor­di­nar­ily over­con­fi­dent: his es­ti­mate is in the 94th per­centile of the prob­a­bil­ity dis­tri­bu­tion. For fun, I flipped an­other pre­dic­tion from false to true: even then, his es­ti­mate is in the 81th per­centile of the prob­a­bil­ity dis­tri­bu­tion (and re­call that if we were rigor­ous about the timeline that Kurzweil claimed, at least one of the true pre­dic­tion would be false).

      So what can this tell us about Kurzweil as a fu­tur­ist, and about the pre­dic­tions he makes? Essen­tially two points stand out:

      1. He’s most likely good at pre­dict­ing.

      2. He’s most likely over­con­fi­dent, re­luc­tant to ad­mit his misses, and hence un­likely to up­date on his failures.

      So I feel we should take Kurzweil’s pre­dic­tions as a good baseline, with much wider er­ror bars and caveats, pay­ing rel­a­tively less at­ten­tion to those ar­eas where we feel that be­ing a good Bayesian up­dater be­comes im­por­tant. We should thus prob­a­bly pay more at­ten­tion to his mod­els than to his in­ter­pre­ta­tion of his mod­els.