Thrasymachus comments on Contra double crux

Thrasymachus 8 Oct 2017 22:36 UTC
16 points
1
Hello Duncan,

My thanks for your reply. I apologise if my wording in the OP was inflammatory or unnecessarily ‘triggering’ (another commenter noted an ‘undertone of aggression’, which I am sorry for, although I promise it wasn’t intended—you are quoted repeatedly as you wrote the canonical exposition for what I target in the OP, rather than some misguided desire to pick a fight with you on the internet). I hope I capture the relevant issues below, but apologies in advance if I neglect or mistake any along the way.

CFAR’s several hundred and the challenge of insider evidence
I was not aware of the of the several hundred successes CFAR reports of double crux being used ‘in the wild’. I’m not entirely sure whether the successes are a) those who find double crux helpful or b) particular instances of double crux resolving disagreement, but I think you would endorse plenty of examples of both. My pretty sceptical take on double crux had ‘priced in’ the expectation of CFAR instructors at least some/many alums thought it was pretty nifty.
You correctly anticipate the sort of worries I would have about this sort of evidence. Self-reported approbation from self-selected participants is far from robust. Given branches of complementary medicine can probably tout thousands to millions of ‘positive results’ and happy customers, yet we know it is in principle intellectually bankrupt, and in practice performs no better than placebo in properly conducted trials. (I regret to add replies along the lines of, “if you had received the proper education in the technique you’d—probably—see it works well”, or “I’m a practicioner with much more experience than any doubter in terms of using this, and it works in my experience” also have analogies here).
I don’t think one need presume mendacity on the part of CFAR, nor gullibility on the part of workshop attendees, to nonetheless believe this testimonial evidence isn’t strongly truth-tracking: one may anticipate similarly positive reports in worlds where (perhaps) double crux doesn’t really work, but other stuff CFAR practices does work, and participants enjoy mingling with similarly rationally minded participants, may have had to invest 4 figure sums to get on the workshop, and so on and so forth. (I recall CFAR’s previous evaluation had stupendous scores on self-reported measures, but more modest performance on objective metrics).
Of course, unlike complementary medicine, double crux does not have such powerful disconfirmation as ‘violates known physics’ or ‘always fails RCTs’. Around the time double crux was proposed I challenged double crux on theoretical grounds (i.e. double cruxes should be very rare), this post was prompted by some of the dissonance on previous threads, but also the lack of public examples of double crux working. Your experience of the success of double crux in workshops is essentially private evidence (at least for now): in the same way it is hard to persuade me of its validity, it is next to impossible for me to rebut it. I nonetheless hope other lines of inquiry are fruitful.
How sparse is a typical web of belief?
One is the theoretical point. I read in your reply disagreement on the ‘double cruxes’ should be rare point (”… the number of people who have been surprised to discover that double cruxes do in fact exist and are findable an order of magnitude more often than one would have naively guessed”). Although you don’t include it as a crux in your reply, it looks pretty crucial to me. If cruxes are as rare as I claim, double cruxing shouldn’t work, and so the participant reports are more likely to have been innocently mistaken.
In essence, I take the issue to be around what web of belief surrounding a typical subject of disagreement looks like. It seems double crux is predicated on this being pretty sparse (at least in terms of important considerations): although lots of beliefs might have some trivial impact on your credence B, B is mainly set by small-n cruxes (C), which are generally sufficient to change ones mind if ones attitude towards them changes.
By contrast, I propose the relevant web tends to be much denser (or, alternatively, the ‘power’ of the population of reasons that may alter ones credence in B is fairly evenly distributed). Credence in B arises from a large number of considerations that weigh upon it, each of middling magnitude. Thus even if I am persuaded one is mistaken, my credence in B does not change dramatically. It follows that ‘cruxes’ are rare, and so two people happening to discover their belief on some recondite topic B is principally determined by some other issue (C), and it is the same for both of them is rare.
This is hard to make very crisp, as (among others) the ‘space of all topics reasonable people disagree’ are hard to pin down. Beyond appeals to my own experience and introspection (“do you really find your belief in (let’s say, some political view like gay marriage or abortion) depends on a single consideration to such a degree that, if it was refuted, you would change your view?) I’d want to marshal a couple of other considerations.
1. When one looks at a topic in philosophy, or science, or many other fields of enquiry, one usually sees a very one-to-many relationship of the topic to germane considerations. A large number of independent lines of evidence support the theory of evolution; a large number of arguments regarding god’s existence in philosophy receive scrutiny (and in return they spawn a one-to-many relationship of argument to objections, objection to counter-objections). I suggest this offers analogical evidence in support of my thesis.
2. Constantin’s report of double cruxing (which has been used a couple of times as an exemplar in other threads) seems to follow the pattern I expect. I struggle to identify a double-crux in the discussion Constatin summarizes: most of the discussion seems to involve whether Salvatier’s intellectual project is making much progress, with then a host of subsidiary considerations (e.g. how much to weigh ‘formal accomplishments’, the relative value of more speculative efforts on far future considerations, etc.), but it is unclear to me if Constantin was persuaded Salvatier’s project was making good progress this would change her mind about the value of the rationalist intellectual community (after all, one good project may not be adequate ‘output’) or vice versa (even if Salvatier recognises his own project was not making good progress, the rationality community might still be a fertile ground to cultivate his next attempt, etc.)
What comprises double-crux?
I took the numbered list of my counter-proposal to have 25% overlap with double crux (i.e. realising your credences vary considerably), not 85%. Allow me to be explicit on how I see 2-4 in my list are in contradistinction to the ‘double crux algorithm’:
- There’s no assumption of an underlying single ‘crux of the matter’ between participants, or for either individually.
- There’s no necessity for a given consideration (even the strongest identified) to be individually sufficient to change ones mind about B
- There’s also no necessity for the strongest considerations proposed by X and Y to have common elements.
- There’s explicit consideration of credence resilience. Foundational issues may by ‘double cruxes’ in that (e.g.) my views on most applied ethics questions would change dramatically if I were persuaded of the virtue ethics my interlocutor holds, but one often makes more progress discussing a less resilient non-foundational claim even if ‘payoff’ in terms of the subsequent credence change in the belief of interest is lower.
This may partly be explained by a broader versus narrower conception of double crux. I take the core idea of double crux to be the ‘find some C for which your disagreement over B relies upon, then discuss C’ (this did, in my defense, comprise the whole of the ‘how to play’ section in the initial write-up). I take you to holding a broader view, where double crux incorporates other related epistemic practices, and it has value in toto.
My objection is expressly this. Double crux is not essential for these incorporated practices. So one can compare discussion with the set of these other practices to this set with the addition of double crux. I aver the set sans double crux will lead to better discussions.
Pedagogy versus performance
I took double crux was mainly being proposed as a leading strategy to resolve disagreement. Hence the comparison to elite philosophers was to suggest it wasn’t a leading strategy by pointing to something better. I see from this comment (and the one you split off into its own thread) you see it more as a more a pedagogical role—even if elite performers do something different, it does valuable work in improving skills. Although I included a paragraph about its possible pedagogical value (admittedly one you may have missed as I started it with a self-indulgent swipe at the rationalist community), I would have focused more on this area had I realised it was CFAR’s main contention.
I regret not to surprise you with doubts about the pedagogical value as well. This mostly arises from the above concerns: if double cruxes are as rare as I propose, it is unclear how searching for them is that helpful an exercise. A related worry (related to the top of the program) is this seems to entail increasing reliance on private evidence regarding whether the technique works: in principle objections to the ‘face value’ of the technique apply less (as it is there to improve skills rather than a proposal for what the ‘finished article’ should look like); adverse reports from non-CFAR alums don’t really matter (you didn’t teach them, so it is no surprise they don’t get it right). What one is left with is the collective impressions of instructors, and the reports of the students.
I guess I have higher hopes for transparency and communicability of ‘good techniques’. I understand CFAR is currently working on further efforts to evaluate itself. I hope to be refuted by the forthcoming data.
- ozymandias 8 Oct 2017 22:50 UTC
  14 points
  1
  Parent
  I want to bring up sequence thinking and cluster thinking, which I think are useful in understanding the disagreement here. As I understand it, Duncan argues that sequence thinking is more common than cluster thinking, and you’re arguing the converse.
  I think most beliefs can be put in either a cluster-thinking or a sequence-thinking framework. However, I think that (while both are important and useful) cluster thinking is generally more useful for coming up with final conclusions. For that reason, I’m suspicious of double crux, because I’m worried that it will cause people to frame their beliefs in a sequence-thinking way and feel like they should change their beliefs if some important part of their sequence was proven wrong, even though (I think) using cluster thinking will generally get you more accurate answers.
  - dxu 8 Oct 2017 23:51 UTC
    7 points
    −1
    Parent
    As I understand it, Duncan argues that sequence thinking is more common than cluster thinking, and you’re arguing the converse.
    This looks remarkably like an attempt to identify a crux in the discussion. Assuming that you’re correct about double-cruxing being problematic due to encouraging sequence-like thinking: isn’t the quoted sentence precisely the kind of simplification that propagates such thinking? Conversely, if it’s not a simplification, doesn’t that provide (weak) evidence in favor of double-cruxing being a useful tool in addressing disagreements?
    - ozymandias 9 Oct 2017 0:03 UTC
      10 points
      1
      Parent
      I think that sequence thinking is important and valuable (and probably undersupplied in the world in general, even while cluster thinking is undersupplied in the rationalist community in specific). However, I think both Thrasymachus and Duncan are doing cluster thinking here—like, if Duncan were convinced that cluster thinking is actually generally a better way of coming to final decisions, I expect he’d go “that’s weird, why is CFAR getting such good results from teaching double crux anyway?” not “obviously I was wrong about how good double crux is.” Identifying a single important point of disagreement isn’t a claim that it’s the only important point of disagreement.
      - Duncan Sabien (Inactive) 9 Oct 2017 2:24 UTC
        6 points
        0
        Parent
        I like this point a lot, and your model of me is accurate, at least insofar as I’m capable of simming this without actually experiencing it. For instance, I have similar thoughts about some of my cutting/oversimplifying black-or-white heuristics, which seem less good than the shades-of-gray epistemics of people around me, and yet often produce more solid results. I don’t conclude from this that those heuristics are better, but rather that I should be confused about my model of what’s going on.
        the gears to ascension 9 Oct 2017 2:30 UTC
        5 points
        0
        Parent
        that makes a ton of sense for theoretically justified reasons I don’t know how to explain yet. anyone want to collab with me on a sequence? I’m a bit blocked on 1. exactly what my goal is and 2. what I should be practicing in order to be able to write a sequence (given that I’m averse to writing post-style content right now)
      - dxu 9 Oct 2017 0:18 UTC
        6 points
        0
        Parent
        Naturally, and I wasn’t claiming it was. That being said, I think that when you single out a specific point of disagreement (without mentioning any others), there is an implication that the mentioned point is, if not the only point of disagreement, then at the very least the most salient point of disagreement. Moreover, I’d argue that if Duncan’s only recourse after being swayed regarding sequence versus cluster thinking is “huh, then I’m not sure why we’re getting such good results”, then there is a sense in which sequence versus cluster thinking is the only point of disagreement, i.e. once that point is settled, Duncan has no more arguments.
        (Of course, I’m speaking purely in the hypothetical here; I’m not trying to make any claims about Duncan’s actual epistemic state. This should be fairly obvious given the context of our discussion, but I just thought I’d throw that disclaimer in there.)
        Duncan Sabien (Inactive) 9 Oct 2017 2:24 UTC
        5 points
        0
        Parent
        Oh, hmm, this is Good Point Also.
- Duncan Sabien (Inactive) 8 Oct 2017 23:35 UTC
  9 points
  0
  Parent
  First off, a symmetric apology for any inflammatory or triggering nature in my own response, and an unqualified acceptance of your own, and reiterated thanks for writing the post in the first place, and thanks for engaging further. I did not at any point feel personally attacked or slighted; to the degree that I was and am defensive, it was over a fear that real value would be thrown out or socially disfavored for insufficient reason.
  (I note the symmetrical concern on your part: that real input value will be thrown out or lost by being poured into a socially-favored-for-insufficient-reason framework, when other frameworks would do better. You are clearly motivated by the Good.)
  You’re absolutely right that the relative lack of double cruxes ought be on my list of cruxes. It is in fact, and I simply didn’t think of it to write it down. I highly value double crux as a technique if double cruxes are actually findable in 40-70% of disagreements; I significantly-but-not-highly value double crux if double cruxes are actually findable in 25-40% of disagreements; I lean toward ceasing to investigate double crux if they’re only findable in 10-25%, and I am confused if they’re rarer than 10%.
  By contrast, I propose the relevant web tends to be much denser (or, alternatively, the ‘power’ of the population of reasons that may alter ones credence in B is fairly evenly distributed). Credence in B arises from a large number of considerations that weigh upon it, each of middling magnitude. Thus even if I am persuaded one is mistaken, my credence in B does not change dramatically. It follows that ‘cruxes’ are rare, and so two people happening to discover their belief on some recondite topic B is principally determined by some other issue (C), and it is the same for both of them is rare.
  I agree that this is a relevant place to investigate, and at the risk of proving you right at the start, I add it to my list of things which would cause me to shift my belief somewhat.
  The claim that I derive from “there’s surprisingly often one crux” is something like the following: that, for most people, most of the time, there is not in fact a careful, conscious, reasoned weighing and synthesis of a variety of pieces of evidence. That, fompmott, the switch from “I don’t believe this” to “I now believe this” is sudden rather than gradual, and, post-switch, involves a lot of recasting of prior evidence and conclusions, and a lot of further confirmation-biased integration of new evidence. That, fompmott, there are a lot of accumulated post-hoc justifications whose functional irrelevance may not even be consciously acknowledged, or even safe to acknowledge, but whose accumulation is strongly incentivized given a culture wherein a list of twenty reasons is accorded more than 20x the weight of a list of one reason, even if nineteen of those twenty reasons are demonstrated to be fake (e.g. someone accused of sexual assault, acquitted due to their ironclad alibi that they were elsewhere, and yet the accusation still lingers because of all the sticky circumstantial bits that are utterly irrelevant).
  In short, the idealized claim of double crux is that people’s belief webs look like this:
  <insert>
  Whereas I read you claiming that people’s belief webs look like this:
  <insert>
  And on reflection and in my experience, the missing case that tilts toward “double crux is surprisingly useful” is that a lot of belief webs look like this:
  <insert>
  … where they are not, in fact, simplistic and absolutely straightforward, but there often is a crux which far outweighs all of the other accumulated evidence.
  I note that, if correct, this theory would indicate that e.g. your average LessWronger would find less value in double crux than your average CFAR participant (who shares a lot in common with a LessWronger but in expectation is less rigorous and careful about their epistemics). This being because LessWrongers try very deliberately to form belief webs like the first image, and when they have a belief web like the third image they try to make that belief feel to themselves as unbalanced and vulnerable as it actually is. Ergo, LessWrongers would find the “Surprise! You had an unjustified belief!” thing happening less often and less unexpectedly.
  If I’m reading you right, this takes care of your first bullet point above entirely and brings us closer to a mutual understanding on your second bullet point. Your third bullet point remains entirely unaddressed in double crux except by the fact that we often have common cultural pressures causing us to have aligned-or-opposite opinions on many matters, and thus in practice there’s often overlap. Your fourth bullet point seems both true and a meaningful hole or flaw in double crux in its idealized, Platonic form, but also is an objection that in practice is rather gracefully integrated by advice to “keep ideals in mind, but do what seems sane and useful in the moment.”
  To the extent that those sections of your arguments which miss were based on my bad explanation, that’s entirely on me, and I apologize for the confusion and the correspondingly wasted time (on stuff that proved to be non-crucial!). I should further clarify that the double crux writeup was conceived in the first place as “well, we have a thing that works pretty well when transmitted in person, but people keep wanting it not transmitted in person, partly because workshops are hard to get to even though we give the average EA or rationalist who can’t afford it pretty significant discounts, so let’s publish something even though it’s Not Likely To Be Good, and let’s do our best to signal within the document that it’s incomplete and that they should be counting it as ‘better than nothing’ rather than judging it as ‘this is the technique, and if I’m smart and good and can’t do it from reading, then that’s strong evidence that the technique doesn’t work for me.’” I obviously did not do enough of that signaling, since we’re here.
  Re: the claim “Double crux is not essential for these incorporated practices.” I agree wholeheartedly on the surface—certainly people were doing good debate and collaborative truthseeking for millennia before the double crux technique was dreamed up.
  I would be interested in seeing a side-by-side test of double crux versus direct instruction in a set of epistemic debate principles, or double crux versus some other technique that purports to install the same virtues. We’ve done some informal testing of this within CFAR—in one workshop, Eli Tyre and Lauren Lee taught half the group double crux as it had always previously been taught, while I discussed with the other half all of the ways that truthseeking conversations go awry, and all of the general desiderata for a positive, forward-moving experience. As it turned out, the formal double crux group did noticeably better when later trying to actually resolve intellectual disagreement, but the strongest takeaway we got from it was that the latter group didn’t have an imperative to operationalize their disagreement into concrete observations or specific predictions, which seems like a non-central confound to the original question.
  As for “I guess I have higher hopes for transparency and communicability of ‘good techniques’,” all I can do is fall back yet again on the fact that, every time skepticism of double crux has reared its head, multiple CFAR instructors and mentors and comparably skilled alumni have expressed willingness to engage with skeptics, and produce publicly accessible records and so forth. Perhaps, since CFAR’s the one claiming it’s a solid technique, 100% of the burden of creating such referenceable content falls on us, but one would hope that the relationship between enthusiasts and doubters is not completely antagonistic, and that we could find some Robin Hansons to our Yudkowskys, who are willing to step up and put their skepticism on the line as we are with our confidence.
  As of yet, not a single person has sent me a request of the form “Okay, Duncan, I want to double crux with you about X such that we can write it down or video it for others to reference,” nor has anyone sent me a request of the form “Okay, Duncan, I suspect I can either prove double crux unworth it or prove [replacement Y] a more promising target. Let’s do this in public?”
  I really really do want all of us to have the best tool. My enthusiasm for double crux has nothing to do with an implication that it’s perfect, and everything to do with a lack of visibly better options. If that’s just because I haven’t noticed something obvious, I’d genuinely appreciate having the obvious pointed out, in this case.
  Thanks again, Thrasymachus.
  - Thrasymachus 9 Oct 2017 22:21 UTC
    5 points
    0
    Parent
    Thank you for your gracious reply. I interpret a couple of overarching themes in which I would like to frame my own: the first is the ‘performance issue’ (i.e. ‘How good is double crux at resolving disagreement/getting closer to the truth’); the second the ‘pedagogical issue’ (i.e. ‘how good is double crux at the second order task of getting people better at resolving disagreement/getting closer to the truth’). I now better understand you take the main support from double crux to draw upon the latter issue, but I’d also like to press on some topics about the former on which I believe we disagree.
    How well does double crux perform?
    Your first two diagrams precisely capture the distinction I have in mind (I regret not having thought to draw my own earlier). If I read the surrounding text right (I’m afraid not to know what ‘fompmott’ means, and google didn’t help me), you suggest that even if better cognisers find their considerations form a denser web like the second diagram, double-crux amenable ‘sparser’ webs are still common in practice, perhaps due to various non-rational considerations. You also add:
    I note that, if correct, this theory would indicate that e.g. your average LessWronger would find less value in double crux than your average CFAR participant (who shares a lot in common with a LessWronger but in expectation is less rigorous and careful about their epistemics). This being because LessWrongers try very deliberately to form belief webs like the first [I think second? - T] image, and when they have a belief web like the third image they try to make that belief feel to themselves as unbalanced and vulnerable as it actually is. Ergo, LessWrongers would find the “Surprise! You had an unjustified belief!” thing happening less often and less unexpectedly.
    This note mirrors a further thought I had (c.f. Ozymandias’s helpful remark in a child about sequence versus cluster thinking). Yet I fear this poses a further worry for the ‘performance issue’ of double crux, as it implies that the existence of cruxes (or double cruxes) may be indicative of pathological epistemic practices. A crux implies something like the following:
    You hold some belief B you find important (at least, important enough you think it is worth your time to discuss).
    Your credence in B depends closely on some consideration C.
    Your credence in C is non-resilient (at least sufficiently non-resilient you would not be surprised to change your mind on it after some not-unduly-long discussion with a reasonable interlocutor).*
    * What about cases where one has a resilient credence in C? Then the subsequent worries do not apply. However, I suspect these cases often correspond to “we tried to double crux and we found we couldn’t make progress on resolving our disagreement about theories of truth/normative ethics/some other foundational issue”.
    It roughly follows from this you should have low resilience in your credence of B. As you note, this is vulnerable, and knowing one had non-resilient credences in important Bs is to be avoided.
    As a tool of diagnosis, double crux might be handy (i.e. “This seems to be a crux for me, yet cruxes aren’t common among elite cognisers—I should probably go check whether they agree this is the crux of this particular matter, and if not maybe see what else they think bears upon B besides C”). Yet (at least per the original exposition) it seems to be more a tool for subsequent ‘treatment’. Doing so could make things worse, not better.
    If X and Y find they differ on some crux, but also understand that superior cognisers tend not to have this crux, and distribute support across a variety of considerations, it seems a better idea for them to explore other candidate considerations rather than trying to resolve their disagreement re. C. If they instead do the double-cruxy thing and try and converge on C, they may be led up the epistemic garden path. They may agree with one another on C (thus B), and thus increase their resilience of C (thus B), yet they also confirm a mistaken web of belief around B which wrongly accords too much weight to C. If (as I suggest) at least half the battle on having good ‘all things considered’ attitudes to recondite matters comprises getting the right weights for relevant considerations on the matter, double crux may celebrate them converging further away from the truth. (I take this idea to be expressed in kernel in Ozymandias’s worry of double crux displacing more-expectedly-accurate cluster thinking with less-expectedly-accurate sequence thinking).
    How good is double crux at ‘levelling people up at rationality’
    The substantial independence of the ‘performance issue’ from the pedagogical issue’
    In the same way practising scales may not be the best music, but make one better at playing music, double crux may not be the best discussion technique, but make one better at discussions. This seems fairly independent of its ‘object level performance’ (although I guess if the worry above is on the right track, we would be very surprised if a technique that on the object level leads beliefs to track truth more poorly nonetheless has a salutatory second-order effect).
    Thus comparisons to practices of elite philosophers (even if they differ) are inapposite—especially, as I understand from one of them, the sort of superior pattern I observe occurs only at a far right tail even among philosophers (i.e. ‘world-class’ as you write, rather then ‘good’, as I write in the OP). It is obviously a great boon if I could get some fraction more like someone like Askell or Shulman without either their profound ability or the time they have invested in these practices.
    On demurring the ‘double crux challenge’
    I regret I don’t think it would be hugely valuable to ‘try double crux’ with an instructor in terms of resolving this disagreement. One consideration (on which more later) is that conditional on me not being persuaded by a large group of people who self-report double crux is great, I shouldn’t change my mind (for symmetry reasons) if this number increases by one other person, or it increases by including me. Another is that the expected yield may not be great, at least in one direction: although I hope I am not ‘hostile’ to double crux, it seems one wouldn’t be surprised if it didn’t work with me, even if its generally laudable.
    Yet I hope I am not quite as recalcitrant as ‘I would not believe until I felt the stigmata with my own hands’. Apart from a more publicly legible case (infra), I’m a bit surprised at the lack of ‘public successes’ of double cruxing (although this may confuse performance versus pedagogy). In addition to Constantin, Raemon points to their own example with gjm. Maybe I’m only seeing what I want to, but I get a similar impression. They exhibit a variety of laudable epistemic practices, but I don’t see a crux or double crux (what they call ‘cruxes’ seem to be more considerations they take to be important).
    The methods of rational self-evaluation
    You note a head-to-head comparison between double crux and an approximate sham-control seemed to favour double crux. This looks like interesting data, and it seems a pity it emerges in the depths of a comment thread (ditto the ‘large n of successes’) rather than being written up and presented—it seems unfortunate that the last ‘public evaluation report’ is about 2 years old. I would generally urge trying to produce more ‘public evidence’ rather than the more private “we’ve generally seen this work great (and a large fraction of our alums agree!)”
    I recognise that “Provide more evidence to satisfy outside sceptics” should not be high on CFAR’s priority list. Yet I think it is instrumental to other important goals instead. Chiefly: “Does what we are doing actually work?”
    You noted in your initial reply undercutting considerations to the ‘we have a large n of successes’, yet you framed this in way that these would often need to amount to a claim of epistemic malice (i.e. ‘either CFAR is lying or participants are being socially pressured into reporting a falsehood’). I don’t work at a rationality institute or specialise in rationality, but on reflection I find this somewhat astonishing. My impression of cognitive biases were that they were much more insidious, that falling prey to them was the rule rather than the exception, and that sincere good faith was not adequate protection (is this not, in some sense, what CFAR casus belli is predicated upon?)
    Although covered en passant, let me explicitly (although non-exhaustively) list things which might bias more private evidence of the type CFAR often cites:
    CFAR staff (collectively) are often responsible for developing the interventions they hope will improve rationality. One may expect them to be invested in them, and more eager to see that they work than see they don’t (c.f. why we prefer double-blinding over single-blinding).
    Other goods CFAR enjoys (i.e. revenue/funding, social capital) seem to go up the better the results of their training. Thus CFAR staff have a variety of incentives pushing them to over-report how good their ‘product’ is (c.f. why conflicts of interest are bad, the general worries about pharma-funded drug trials).
    Many CFAR participants have to spend quite a lot of money (i.e. fees and travel) to attend a workshop. They may fear looking silly if it turns out after all this it didn’t do anything, and so incentivised to assert it was much more helpful than it actually was (c.f. choice supportive bias).
    There are other aspects of CFAR workshops that participants may enjoy independent of the hoped-for improvement of their rationality (e.g. hanging around interesting people like them, personable and entertaining instructors, romantic entanglements). This extraneous benefits may nonetheless bias upwards their estimate of how effective CFAR workshops are at improving their rationality (c.f. halo effect).
    I am sure there are quite a few more. One need not look that hard to find lots of promising studies supporting a given intervention undermined by any one of these.
    The reference class of interventions with “a large corpus of (mainly self-reported) evidence of benefit, but susceptible to these limitations” is dismal. It includes many branches of complementary medicine. It includes social programs (e.g. ‘scared straight’) that we now know to be extremely harmful. It includes a large number of ineffective global poverty interventions. Beyond cautionary tales, I aver these approximate the modal member of the class: when the data is so subjective, and the limitations this severe, one should expect the thing in question doesn’t actually work after all.
    I don’t think this expectation changes when we condition on the further rider “And the practicioners really only care about the truth re. whether the intervention works or not.” What I worry about going on under the hood is a stronger (and by my lights poorly substantiated) claim of rationalist exceptionalism: “Sure, although cognitive biases plague entire fields of science and can upend decades of results, and we’re appropriately quick to point out risk of bias of work done by outsiders, we can be confident that as we call ourselves rationalists/we teach rationality/we read the sequences/etc. we are akin to Penelope refusing her army of suitors—essentially incorruptible. So when we do similarly bias-susceptible sorts of things, we should give one another a pass.”
    I accept ‘gold standard RCTs’ are infeasible (very pricey, and how well can one really do ‘sham CFAR’?) yet I aver there is quite a large gap between this ideal of evidence and the actuality (i.e. evidence kept in house, and which emerges via reference in response to criticism) which could be bridged by doing more write-ups, looking for harder metrics that put one more reliably in touch with reality, and so on. I find it surprisingly incongruent that the sort of common cautions about cognitive biases—indeed, common cautions that seem predicates for CFAR’s value proposition (e.g. “Good faith is not enough”, “Knowing about the existence of biases does not make one immune to them”, Feynmann’s dictum about ‘you are the easiest person to fool’), are not reflected in its approach to self-evaluation.
    If nothing else, opening up more of CFAR’s rationale, evidence, etc. to outside review may allow more benefits of outside critique. Insofar as it is the case you found this exchange valuable, one may anticipate greater benefit from further interaction with higher-quality sceptics.