Quick Thoughts on Generation and Evaluation of Hypotheses in a Community

At lunch we talked about hypothesis evaluation and generation, and I thought I’d put some of the thoughts I had into a post.

Here’s some points that seem relevant to me when thinking about institution building around this topic.

  • Evaluation often needs to come a long time after the generation process.

    • Andrew Wiles spent over six years in his house trying to prove Fermat’s Last Theorem, and he didn’t try to write up the steps of his research before he proved it. Had he tried to bring a whole academic sub-community along for the ride, I’d guess he wouldn’t have completed the proof before he died. It was correct for him to wait until he had a clear reason why the new math was useful (i.e. he had proven the theorem), before he presented it in public writings+lectures, else it would not have been worth it for the community to learn the speculative new math.

    • Einstein had a similar experience, where there was a big gap between the time when he had sufficient evidence to state and trust that the hypothesis was true, and the time when enough evidence was communicable to his fellow physics community that they believed him.

    • Sometimes it takes a couple of years in a small group to turn an idea into a clearly testable and falsifiable set of predictions, at which point it can be incorporated into the community.

  • However, evaluation by a common set of epistemic standards is required before the model that generated the hypothesis should become common knowledge. It is very costly to make something common knowledge, and is only worth the cost when the community has good reason to trust it.

  • It’s also important that small groups attempting to test a new intuition or paradigm are able to cease work after a reasonable effort has been given. When I look around the world I see a great number of organisations living past their project, merely because it is by far the path of least resistance. It’s important to be able to try again.

    • I remember someone saying that in Google X, all projects must come with a list of criteria such that if the criteria are ever met, the project must be cancelled. Often they’re quite extreme and don’t get reached, but at least there’s an actual critera.

Regarding the current LW community, I see a few pieces of infrastructure missing before we’re able to get all of this right.

  • A process that is able to check a new idea against our epistemic standards and make that evaluation common knowledge

    • I note that I am typically more optimistic than others about the shared epistemic standards that the LessWrong community has. The general set of models pointed to in Eliezer’s A Technical Explanation of Technical Explanation (e.g. credences that optimise for proper scoring rules, etc) seem to me to be widely internalised by core folks around here on an implicit level.

      • And this is not to imply that we do not need a meta level process for improving our epistemic standards—which is obviously of the utmost importance for any long term project like this.

    • But we currently have no setup for ensuring ideas pass through this before becoming common knowedge. Nearly everyone has people they trust, but there are no common people we all trust (well, maybe one or two, but they tend to have the property of being extremely busy).

  • We’re also missing the incentive to write things up to a technical standard.

    • After I read Scott Alexander’s long and fascinating post “Meditations on Moloch”, I assumed that the term “Moloch” was to be forever used around me without my being quite sure what everyone was talking about, nor whether they even had reason to be confident they were all using the word correctly. I was delighted when Eliezer put in the work to turn a large portion (I think >50%) of the initial idea into an explicit (and incredibly readable) model that makes clear predictions.

    • But I currently don’t expect this to happen in the general case. I don’t yet expect anyone to put in dozens of hours of distillation work to turn , say, Zvi’s “Slack and the Sabbath” sequence into an extended dialogue on the theory of constraints or something else that we can actually knowably build a shared model of.

      • Even if we have a setup for being able to evaluate and make common knowledge the writings, few people are currently trying to write them. People may get disappointed too—they may put in dozens of hours of work only to be told “Not good enough”, which is disheartening.

  • As such, the posts that tend to get widely read tend to not be checked. People with even slightly different epistemic standards build up very different models, and tend to disagree on a multitude of object level facts and policies. Small groups that have found insights are not able to just build on them in the public domain.

Question: How does the best system we’ve ever built to solve this problem (edit: I mean science), deal with these two parts?

  • There is a common knowledge epistemic standard in each field, that journals keep up with peer review, though my impression is no meta-level process for updating it in most fields (cf. the replication crisis).

  • The incentive to pass the community epistemic standards is strong—you get financial security and status (both dominance over students and respect from peers). I am not sure how best to offer any of these, though one can imagine a number of high-integrity ways of trying to solve this.

Thoughts on site-design that would help with this:

  • I have a bunch of thoughts on this that are harder to write down. I think that the points above on hypothesis generation point to desiring more small-group spaces that can build their own models—something that leans into the bubble thing that exists more on Facebook.

  • Then the evaluative parts pushes towards having higher standards for getting into curated sequences and the like (or, rather, building further levels of the hierarchy above curated/​curated sequences, that are higher status and harder to get into), and having formal criteria for those sections (cf. Ray’s post on peer review and LessWrong, which, having gone back and looked at, actually overlaps a bunch with this post). This is the one we moved toward more with the initial design of the site.

  • I feel like pushing toward either of these on their own (especially the first one—online bubbles have been pretty destructive to society) might be pretty bad, and am thinking about plots to achieve both simultaneously.

One significant open question that, writing this down I realise I have not got many explicit gears for, is how to have a process for updating the epistemic standards when someone figures out a way that they can be improved. Does anyone have any good examples from history /​ their own lives /​ elsewhere where this happened? Maybe a company that had an unusually good process for replacing the CEO?