Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality”

Follow-up to:

In the days since we published our previous post, a number of people have come up to me and expressed concerns about our new mission. Several of these had the form “I, too, think that AI safety is incredibly important — and that is why I think CFAR should remain cause-neutral, so it can bring in more varied participants who might be made wary by an explicit focus on AI.”

I would here like to reply to these people and others, and to clarify what is and isn’t entailed by our new focus on AI safety.

First: Where are CFAR’s activities affected by the cause(s) it chooses to prioritize?

The question of which causes CFAR aims to help (via its rationality training) plugs into our day-to-day activities in at least 4 ways:
1) It affects which people we target. If AI safety is our aim, we must then backchain from “Who is likely both to impact AI safety better if they have more rationality skills, and also to be able to train rationality skills with us?” to who to target with specialized workshops.
2) It affects which rationality skills we prioritize. AI safety work benefits from the ability to reason about abstract, philosophically confusing issues (notably: AI); which presumably benefits from various rationality skills. Competitive marathon running probably also benefits from certain rationality skills; but they are probably different ones. Designing an “art of rationality” that can support work on AI safety is different from designing an “art of rationality” for some other cause. (Although see point C, below.)
3) It affects what metrics or feedback systems we make interim use of, and how we evaluate our work. If “AI safety via rationality training” is the mission, then “person X produced work A that looks existential risk-reducing on our best guess, and X says they would’ve been less able to do A without us” is the obvious proxy measure of whether we’re having impact. If we have this measure, we can use our measurements of it to steer.
4) It affects explicit curriculum at AI-related or EA-related events. E.g., it affects whether we’re allowed to run events at which participants double crux about AI safety, and whether we’re allowed to present arguments from Bostrom’s Superintelligence without also presenting a commensurate amount of analysis of global poverty interventions.
In addition to the above four effects, it has traditionally also affected: 5) what causes/​opinions CFAR staff feel free to talk about when speaking informally to participants at workshops or otherwise representing CFAR. (We used to try not to bring up such subjects.)
One thing to notice, here, is that CFAR’s mission doesn’t just affect our external face; it affects the details of our day-to-day activities. (Or at minimum, it should affect these.) It is therefore very important that our mission be: (a) actually important; (b) simple, intelligible, and usable by our staff on a day-to-day basis; (c) corresponding to a detailed (and, ideally, accurate) model in the heads of at least a few CFARians doing strategy (or, better, in all CFARians), so that the details of what we’re doing can in fact “cut through” to reducing existential risk.
So, okay, we just looked concretely at how CFAR’s mission (and, in particular, its prioritization of AI safety) can affect its day-to-day choices.
It’s natural next to ask what upsides people were hoping for from a (previous or imagined) “cause neutral” CFAR, and to discuss which of those upsides we can access still, and which we can’t. I’ll start with the ones we can do.

Some components that people may be hoping for from “cause neutral”, that we can do, and that we intend to do:

A. For students of all intellectual vantage points, we can make a serious effort to be “epistemically trustworthy relative to their starting point”.
By this I mean:
  • We can be careful to include all information that they, from their vantage point, would want to know—even if on our judgment, some of the information is misleading or irrelevant, or might pull them to the “wrong” conclusions.

  • Similarly, we can attempt to expose people to skilled thinkers they would want to talk with, regardless of those thinkers’ viewpoints; and we can be careful to allow their own thoughts, values, and arguments to develop, regardless of which “side” this may lead to them supporting.

  • More generally, we can and should attempt to cooperate with each student’s extrapolated volition, and to treat the student as they (from their initial epistemic vantage point; and with their initial values) would wish to be treated. Which is to say that we should not do anything that would work less well if the algorithm behind it were known, and that we should attempt to run such workshops (and to have such conversations, and so on) as would cause good people of varied initial views to stably on reflection want to participate in them.

In asserting this commitment, I do not mean to assert that others should believe this of us; only that we will aim to do it. You are welcome to stare skeptically at us about potential biases; we will not take offense; it is probably prudent. Also, our execution will doubtless have flaws; still, we’ll appreciate it if people point such flaws out to us.
B. We can deal forthrightly and honorably with potential allies who have different views about what is important.
That is: we can be clear and explicit about the values and beliefs we are basing CFAR’s actions on, and we can attempt to negotiate clearly and explicitly with individuals who are interested in supporting particular initiatives, but who disagree with us about other parts of our priorities.[1]
C. We can create new “art of rationality” content at least partly via broad-based exploratory play — and thus reduce the odds that our “art of rationality” ends up in a local optimum around one specific application.
That is: we can follow Feynman’s lead and notice and chase “spinning plates”. We can bring in new material by bringing in folks with very different skillsets, and seeing what happens to our art and theirs when we attempt to translate things into one another’s languages. We can play; and we can nourish an applied rationality community that can also play.

Some components that people may be hoping for from “cause neutral”, that we can’t or won’t do:

i. Appear to have no viewpoints, in hopes of attracting people who don’t trust those with our viewpoints.
We can’t do this one. Both CFAR as an entity and individual CFAR staff, do in fact have viewpoints; there is no high-integrity way to mask that fact. Also, “integrity” isn’t a window-dressing that one pastes onto a thing, or a nicety that one can compromise for the sake of results; “integrity” is a word for the basic agreements that make it possible for groups of people to work together while stably trusting one another. Integrity is thus structurally necessary if we are to get anything done at all.
All we can do is do our best to *be* trustworthy in our dealings with varied people, and assume that image will eventually track substance. (And if image doesn’t, we can look harder at our substance, see if we may still be subtly acting in bad faith, and try again. Integrity happens from the inside out.)
ii. Leave our views or plans stalled or vague, in cases where having a particular viewpoint would expose us to possibly being wrong (or to possibly alienating those who disagree).
Again, we can’t do this one; organizations need a clear plan for their actions to have any chance at either: i) working; or ii) banging into data and allowing one to notice that the plan was wrong. Flinching from clearly and visibly held views is the mother of wasting time. (Retaining a willingness to say “Oops!” and change course is, however, key.)
iii. Emphasize all rationality use cases evenly. Cause all people to be evenly targeted by CFAR workshops.
We can’t do this one either; we are too small to pursue all opportunities without horrible dilution and failure to capitalize on the most useful opportunities.
We are presently targeting all workshops at either: (a) folks who are more likely than usual to directly impact existential risk; or (b) folks who will add to a robust rationality community, and/​or (c) allow us to learn more about the art (e.g., by having a different mix of personalities, skills, or backgrounds than most folks here).
Coming soon:
  • CFAR’s history around our mission: How did we come to change?

[1] In my opinion, I goofed this up historically in several instances, most notably with respect to Val and Julia, who joined CFAR in 2012 with the intention to create a cause-neutral rationality organization. Most integrity-gaps are caused by lack of planning rather than strategic deviousness; someone tells their friend they’ll have a project done by Tuesday and then just… doesn’t. My mistakes here seem to me to be mostly of this form. In any case, I expect the task to be much easier, and for me and CFAR to do better, now that we have a simpler and clearer mission.