Quick Thoughts on Generation and Evaluation of Hypotheses in a Community

At lunch we talked about hy­poth­e­sis eval­u­a­tion and gen­er­a­tion, and I thought I’d put some of the thoughts I had into a post.

Here’s some points that seem rele­vant to me when think­ing about in­sti­tu­tion build­ing around this topic.

  • Eval­u­a­tion of­ten needs to come a long time af­ter the gen­er­a­tion pro­cess.

    • An­drew Wiles spent over six years in his house try­ing to prove Fer­mat’s Last The­o­rem, and he didn’t try to write up the steps of his re­search be­fore he proved it. Had he tried to bring a whole aca­demic sub-com­mu­nity along for the ride, I’d guess he wouldn’t have com­pleted the proof be­fore he died. It was cor­rect for him to wait un­til he had a clear rea­son why the new math was use­ful (i.e. he had proven the the­o­rem), be­fore he pre­sented it in pub­lic writ­ings+lec­tures, else it would not have been worth it for the com­mu­nity to learn the spec­u­la­tive new math.

    • Ein­stein had a similar ex­pe­rience, where there was a big gap be­tween the time when he had suffi­cient ev­i­dence to state and trust that the hy­poth­e­sis was true, and the time when enough ev­i­dence was com­mu­ni­ca­ble to his fel­low physics com­mu­nity that they be­lieved him.

    • Some­times it takes a cou­ple of years in a small group to turn an idea into a clearly testable and falsifi­able set of pre­dic­tions, at which point it can be in­cor­po­rated into the com­mu­nity.

  • How­ever, eval­u­a­tion by a com­mon set of epistemic stan­dards is re­quired be­fore the model that gen­er­ated the hy­poth­e­sis should be­come com­mon knowl­edge. It is very costly to make some­thing com­mon knowl­edge, and is only worth the cost when the com­mu­nity has good rea­son to trust it.

  • It’s also im­por­tant that small groups at­tempt­ing to test a new in­tu­ition or paradigm are able to cease work af­ter a rea­son­able effort has been given. When I look around the world I see a great num­ber of or­gani­sa­tions liv­ing past their pro­ject, merely be­cause it is by far the path of least re­sis­tance. It’s im­por­tant to be able to try again.

    • I re­mem­ber some­one say­ing that in Google X, all pro­jects must come with a list of crite­ria such that if the crite­ria are ever met, the pro­ject must be can­cel­led. Often they’re quite ex­treme and don’t get reached, but at least there’s an ac­tual critera.

Re­gard­ing the cur­rent LW com­mu­nity, I see a few pieces of in­fras­truc­ture miss­ing be­fore we’re able to get all of this right.

  • A pro­cess that is able to check a new idea against our epistemic stan­dards and make that eval­u­a­tion com­mon knowledge

    • I note that I am typ­i­cally more op­ti­mistic than oth­ers about the shared epistemic stan­dards that the LessWrong com­mu­nity has. The gen­eral set of mod­els pointed to in Eliezer’s A Tech­ni­cal Ex­pla­na­tion of Tech­ni­cal Ex­pla­na­tion (e.g. cre­dences that op­ti­mise for proper scor­ing rules, etc) seem to me to be widely in­ter­nal­ised by core folks around here on an im­plicit level.

      • And this is not to im­ply that we do not need a meta level pro­cess for im­prov­ing our epistemic stan­dards—which is ob­vi­ously of the ut­most im­por­tance for any long term pro­ject like this.

    • But we cur­rently have no setup for en­sur­ing ideas pass through this be­fore be­com­ing com­mon knowedge. Nearly ev­ery­one has peo­ple they trust, but there are no com­mon peo­ple we all trust (well, maybe one or two, but they tend to have the prop­erty of be­ing ex­tremely busy).

  • We’re also miss­ing the in­cen­tive to write things up to a tech­ni­cal stan­dard.

    • After I read Scott Alexan­der’s long and fas­ci­nat­ing post “Med­i­ta­tions on Moloch”, I as­sumed that the term “Moloch” was to be for­ever used around me with­out my be­ing quite sure what ev­ery­one was talk­ing about, nor whether they even had rea­son to be con­fi­dent they were all us­ing the word cor­rectly. I was delighted when Eliezer put in the work to turn a large por­tion (I think >50%) of the ini­tial idea into an ex­plicit (and in­cred­ibly read­able) model that makes clear pre­dic­tions.

    • But I cur­rently don’t ex­pect this to hap­pen in the gen­eral case. I don’t yet ex­pect any­one to put in dozens of hours of dis­til­la­tion work to turn , say, Zvi’s “Slack and the Sab­bath” se­quence into an ex­tended di­alogue on the the­ory of con­straints or some­thing else that we can ac­tu­ally know­ably build a shared model of.

      • Even if we have a setup for be­ing able to eval­u­ate and make com­mon knowl­edge the writ­ings, few peo­ple are cur­rently try­ing to write them. Peo­ple may get dis­ap­pointed too—they may put in dozens of hours of work only to be told “Not good enough”, which is dis­heart­en­ing.

  • As such, the posts that tend to get widely read tend to not be checked. Peo­ple with even slightly differ­ent epistemic stan­dards build up very differ­ent mod­els, and tend to dis­agree on a mul­ti­tude of ob­ject level facts and poli­cies. Small groups that have found in­sights are not able to just build on them in the pub­lic do­main.

Ques­tion: How does the best sys­tem we’ve ever built to solve this prob­lem (edit: I mean sci­ence), deal with these two parts?

  • There is a com­mon knowl­edge epistemic stan­dard in each field, that jour­nals keep up with peer re­view, though my im­pres­sion is no meta-level pro­cess for up­dat­ing it in most fields (cf. the repli­ca­tion crisis).

  • The in­cen­tive to pass the com­mu­nity epistemic stan­dards is strong—you get fi­nan­cial se­cu­rity and sta­tus (both dom­i­nance over stu­dents and re­spect from peers). I am not sure how best to offer any of these, though one can imag­ine a num­ber of high-in­tegrity ways of try­ing to solve this.

Thoughts on site-de­sign that would help with this:

  • I have a bunch of thoughts on this that are harder to write down. I think that the points above on hy­poth­e­sis gen­er­a­tion point to de­siring more small-group spaces that can build their own mod­els—some­thing that leans into the bub­ble thing that ex­ists more on Face­book.

  • Then the eval­u­a­tive parts pushes to­wards hav­ing higher stan­dards for get­ting into cu­rated se­quences and the like (or, rather, build­ing fur­ther lev­els of the hi­er­ar­chy above cu­rated/​cu­rated se­quences, that are higher sta­tus and harder to get into), and hav­ing for­mal crite­ria for those sec­tions (cf. Ray’s post on peer re­view and LessWrong, which, hav­ing gone back and looked at, ac­tu­ally over­laps a bunch with this post). This is the one we moved to­ward more with the ini­tial de­sign of the site.

  • I feel like push­ing to­ward ei­ther of these on their own (es­pe­cially the first one—on­line bub­bles have been pretty de­struc­tive to so­ciety) might be pretty bad, and am think­ing about plots to achieve both si­mul­ta­neously.

One sig­nifi­cant open ques­tion that, writ­ing this down I re­al­ise I have not got many ex­plicit gears for, is how to have a pro­cess for up­dat­ing the epistemic stan­dards when some­one figures out a way that they can be im­proved. Does any­one have any good ex­am­ples from his­tory /​ their own lives /​ el­se­where where this hap­pened? Maybe a com­pany that had an un­usu­ally good pro­cess for re­plac­ing the CEO?