Evaluating expertise: a clear box model


Pur­pose of ex­per­tise modelling

To get what we value we must make good de­ci­sions. To make these de­ci­sions we must know what rele­vant facts are true. But the world is so com­plex that we can­not check ev­ery­thing di­rectly our­selves and so must defer to topic “ex­perts” for some things. How should we choose these ex­perts and how much should we be­lieve what they tell us? In this doc­u­ment, I’ll de­scribe a way to eval­u­ate ex­perts.

Many of the prob­lems in the world, be they poli­ti­cal, eco­nomic, sci­en­tific, or per­sonal, are caused by or ex­ac­er­bated by mak­ing epistemic mis­takes. We trust in the wrong ad­vice and don’t seek out the right ad­vice. We vote for the wrong poli­ti­ci­ans, be­lieve the mar­keters, pro­mote bad bosses, are mes­mer­ized by con­spir­acy the­o­ries, are dis­tracted by the ir­rele­vant, fight with our neigh­bors, lack im­por­tant in­for­ma­tion, suffer ac­ci­dents, and don’t know the best of what has been dis­cov­ered. If we ac­cu­rately know what to do, how to do it, and why to do it, then we be­come more effec­tive and mo­ti­vated.

Types of ex­per­tise modelling

To eval­u­ate these ex­perts in­di­vi­d­u­ally, we can use three meth­ods: black box mod­els, clear box mod­els, or defer­ring fur­ther to other, “meta”, ex­perts about these topic ex­perts (see also this and this).

  • Black box/​out­side view of the ex­pert: This type of mod­el­ling would be just look­ing at the ex­pert’s pre­dic­tion ac­cu­racy in the past with­out ask­ing about de­tailed prop­er­ties of how they come to those de­ci­sions. Their pre­dic­tion ac­cu­racy is ul­ti­mately what we want to get at but some­times track records are in­com­plete or don’t ex­ist yet.

  • Clear box/​white box/​in­side view of the ex­pert/​in­ter­pretabil­ity: This type of mod­el­ling looks in­side and asks about the spe­cific prop­er­ties of the ex­perts that make them ac­cu­rate. This lets us gauge their opinions when we don’t have a pre­dic­tive track record for them. It also lets us bet­ter es­ti­mate to what ex­tent their ex­per­tise gen­er­al­izes, points out pos­si­ble ways they may err and how to fix these er­rors, and points out how to gain ex­per­tise our­selves and im­prove upon the state-of-the-art.

  • So­cial rep­u­ta­tion/​defer­ence poin­ter: This strat­egy passes the buck of an­a­lyz­ing whether to be­lieve an ex­pert, to other, meta-ex­perts; but, it still re­quires an abil­ity to eval­u­ate a meta-ex­pert’s abil­ity to eval­u­ate other ex­perts, and so re­duces to us­ing black box or clear box mod­els about the meta-ex­pert’s abil­ity to eval­u­ate other ex­perts. This has the ad­van­tage of let­ting us quickly as­sess some­thing, but has the down­sides of so­cial bi­ases and of play­ing the game-of-tele­phone.

How ex­per­tise mod­el­ling fits within a truth find­ing process

To move to­wards know­ing the truth about a topic, a good pro­cess to go through would be the list of the fol­low­ing steps:

  1. Figure out one’s own im­pres­sion: Figure out our in­de­pen­dent, with­out-the-ex­perts (though pos­si­bly with-their-mod­els) im­pres­sion of the topic. For in­stance by us­ing Fermi mod­el­ling and Bayesian anal­y­sis, tak­ing into ac­count the limi­ta­tions in our data, and deal­ing with the up­sides and down­sides of us­ing ex­plicit prob­a­bil­ities.

  2. Eval­u­ate ex­perts in­di­vi­d­u­ally: Com­bine the three types of ex­per­tise mod­el­ling above to eval­u­ate ex­perts. Some ad­di­tional tools and per­spec­tives for this: dou­ble crux­ing, frame con­flicts, types of dis­agree­ment, causes of dis­agree­ment anal­y­sis, Bayesian truth serum, mechanism de­sign, the mean­ings of words, bets, anal­y­sis of the dy­nam­ics be­tween ex­perts, and Au­mann’s agree­ment the­o­rem.

  3. Ag­gre­gate ex­pert opinions: Ag­gre­gate across a sam­ple of the topic ex­perts to figure out what we be­lieve given all of them. For ex­am­ple, us­ing tools like pre­dic­tion mar­kets 1,2,3, the tech­niques used in su­perfore­cast­ing, the Delphi method, and mime­sis.

  4. Com­bine one’s im­pres­sion and the ag­gre­ga­tion of ex­pert opinions into an over­all “all things con­sid­ered” as­sess­ment.

(Fur­ther gains can be had in com­plex­ify­ing and go­ing back and forth over these steps and not just down the list. Also, gains may be had as a com­mu­nity us­ing a pro­cess like ‘Ev­i­den­tial Rea­son­ing’ referred to in here and per­haps mechanisms like that de­scribed here.)

Clear box ex­per­tise mod­el­ling

Main sug­gested heuris­tics for clear box ex­per­tise modelling

Let’s zoom in now on clear box mod­el­ling, the pri­mary pur­pose of this post. How do we eval­u­ate when oth­ers know more about a topic than our­selves? How do we com­pare ex­perts? How can we know how much some­one knows about a com­plex topic and how clear their think­ing is about it?

Loosely in­spired by AI the­ory, I be­lieve that some good heuris­tic fea­tures to fo­cus on are the fol­low­ing (see also this post that makes some similar points):

  • Data/​un­su­per­vised learn­ing data qual­ity(1, 2): How much data and feed­back have these ex­perts been ex­posed to? If they have short feed­back loops over a long time pe­riod that are ac­cu­rate (not noisy or ir­rele­vant or statis­ti­cally bi­ased) and if the prob­lem is sim­ple then they prob­a­bly have enough data for de­vel­op­ing good ex­plicit and in­tu­itive mod­els. Have they been ex­posed to al­ter­na­tive mod­els and had their own thoughts sub­ject to feed­back and crit­i­cism? Also, if they have knowl­edge and skills in a closely re­lated do­main these may trans­fer to this topic. The AI analogue is the amount and qual­ity of train­ing data they have and roughly cor­re­sponds to the units of ac­cu­racy/​vir­tual cy­cle.

  • In­cen­tives/​mo­ti­va­tion/​su­per­vised learn­ing: How mo­ti­vated are they to­wards un­der­stand­ing the topic? How mo­ti­vated are they to share that knowl­edge with­out dis­tor­tion? If their in­cen­tives are al­igned with yours and they are paid to be ac­cu­rate about the topic then there is a good chance they are mo­ti­vated to give you the cor­rect an­swer. This fac­tor can be bro­ken up into in­cen­tives to come to know the truth per­son­ally and in­cen­tives to con­vey what they be­lieve ac­cu­rately. The AI analogue is get­ting the right re­in­force­ment, cor­rectly la­beled data, or hav­ing the right util­ity func­tion and roughly cor­re­sponds to the units of cy­cles/​cy­cles.

  • Com­pute: How neu­rolog­i­cally in­tel­li­gent (the neu­rolog­i­cal con­tri­bu­tion to IQ) are they, how cre­atively are they tak­ing into ac­count di­verse per­spec­tives, and how at­ten­tively fo­cused on the task are they? This fac­tor is a bit messy be­cause of the neu­ro­science, but roughly cor­re­sponds to raw neu­rolog­i­cal speed, neu­rolog­i­cal par­allelism, low level neu­rolog­i­cal wiring effi­ciency, neu­ro­plas­tic­ity, and work­ing mem­ory size act­ing as a mem­ory cache to speed things up. The AI analogue is hav­ing a lot of com­pute per unit time and cor­re­sponds to the units of cy­cles/​sec­ond.

  • Effec­tive think­ing: How good are they at think­ing, in terms of ra­tio­nal­ity and meta-cog­ni­tion, about the prob­lems? If they don’t have good gen­eral meth­ods to learn, think, and find men­tal er­rors then they are likely to be in­effi­cient at figur­ing out the truth. Effec­tive think­ing meth­ods are par­tially de­pen­dent on the topic. The AI analogue is hav­ing effi­cient good al­gorithms that ap­prox­i­mate solu­tions well and don’t have sys­tem­atic bi­ases and roughly cor­re­sponds to the units vir­tual cy­cles/​cy­cle.

  • Time: How long have they been think­ing about the topic at hand? Even given all the above, if they just haven’t had time to think about the ques­tion they may very well not give a good an­swer. The AI analogue cor­re­sponds to how deep a search pro­cess pro­gresses and sim­ply how much com­pute time has oc­cured and cor­re­sponds to the units of sec­onds.

(note that the ne­ces­sity of these heuris­tics will de­pend on the spe­cific topic and its type of difficulty)

A Fermi pseudo equa­tion (the math­e­mat­i­cal ver­sion of pseu­docode) to sum­ma­rize this:

The im­por­tance of each fac­tor would vary by topic. As a heuris­tic com­posed of heuris­tics, I think this is a good start.

Use of clear box ex­per­tise modelling

Th­ese fac­tors can be used ei­ther in Fermi pseudo equa­tion form, or as a check­list to com­pare ex­perts and help en­sure you con­sider all rele­vant fac­tors. (See here for the use­ful­ness of check­lists.)

Th­ese heuris­tics can also be used con­struc­tively when try­ing to be­come an ex­pert in a topic or when teach­ing oth­ers, as these are fac­tors to op­ti­mize for in or­der to un­der­stand a topic. They also give a sense of how much you know your­self in com­par­i­son (to oth­ers and in an ab­solute sense) so you can know how hum­ble you should be and how much you have yet to learn.

Fi­nally once you have eval­u­ated the ex­per­tise of some­one you can use that in­for­ma­tion in your truth find­ing pro­cesses which you in turn use to make de­ci­sions and achieve your goals and val­ues.

In the spirit of pro­vid­ing mod­els that peo­ple can in­ter­act with I have pro­vided a sim­ple on­line calcu­la­tor for the ex­per­tise equa­tion heuris­tic:

Ex­per­tise Calculator

(this is very much a rough draft calcu­la­tor and, with its guessed weights, tries to cover the vast range of ex­per­tise from your dog Spot con­sid­er­ing the topic for a mo­ment to Ein­stein de­vot­ing his life to it)

My thanks to Ozzie Gooen, David Kristoffers­son, De­nis Drescher, Michael Aird, Mar­cello Her­reshoff, Siebe Rozen­dal, Eliz­a­beth, Dan Bur­foot, Gre­gory Lewis, Spencer Green­berg, Shri Sam­son, An­dres Gomez Emils­son, Alexey Turchin, and Rem­melt Ellen for re­view­ing and pro­vid­ing helpful feed­back on the ar­ti­cle.