Does CFAR “eat its own dogfood”? Do the cognitive tools help in running the organization itself? Can you give concrete examples? Are you actually outperforming comparable organizations on any obvious metric due to your “applied rationality”? (Why ain’tcha rich? Or are you?)
A response to just the first three questions. I’ve been at CFAR for two years (since January 2018). I’ve noticed, especially during the past 2-3 months, that my mind is changing. Compared to a year, or even 6 months ago, it seems to me that my mind more quickly and effortlessly moves in some ways that are both very helpful and resemble some of the cognitive tools we offer. There’s obviously a lot of stuff CFAR is trying to do, and a lot of techniques/concepts/things we offer and teach, so any individual’s experience needs to be viewed as part of a larger whole. With that context in place, here are a few examples from my life and work:
Notice the person I’m talking to is describing a phenomenon but I can’t picture anything —> Ask for an example (Not a technique we formally teach at the workshop, but seems to me like a basic application of being specific. In that same vein: while teaching or explaining a concept, I frequently follow a concrete-abstract-concrete structure.)
I’m making a plan —> Walk through it, inner sim / murphyjitsu style (I had a particularly vivid and exciting instance of this a few weeks ago: I was packing my bag for a camping trip and found myself, without having explicitly set the intention to do so, simulating the following day with an eye for what I would need. I noticed that I would need my camping spork, which I hadn’t thought of before, and packed it! Things like this happen regularly when planning workshops and doing my day-to-day CFAR work, in addition to my personal life.)
I’m willing to make trades of money for time, like paying for Ubers or shorter flights or services, etc. I used to have a pretty rigid deontological rule for myself against ever spending money when I “didn’t need to,” which I believe now was to my own detriment. I think internalizing Units of Exchange and Goal Factoring and sunk cost and willingness to pay lead me to a) more clearly see how much I value my time, and b) acquire felt “permission” (from myself) to make trades that previously seemed weird or unacceptable, to great benefit to myself and others. I notice this shift when I’m doing ops work with people outside of CFAR and they say something like, “No, it’s just impossible to carpet the entire floor. The venue doesn’t want us to, and besides it would cost a ton of money.” And I say something like, “Well, we really value having a cozy, welcoming, comfortable space for participants, and we’d probably be willing to pay quite a bit for that. What if we had $2k to spend on it—could we do it then?” and they say, “What… I mean, I guess…” Or I’m talking to a friend about paying for membership in a pickup basketball league and I say, “So I might miss one or two of the five games, but I’d gladly pay the full-season price of $130 to play even just two or three really good pickup games, so I’m definitely in.” and he responds with something like, “Huh, well I didn’t think about it like that but I guess it’s worth that much to me…” I feel excited at what seems to me like more freedom for myself in this area. Some good dogfood.
Something needs doing in the org and I have a vision for how to do it —> Just create the 80-20 version of my proposal in 5 minutes and run with that. (This one is insane. It’s wild to me how many things fall through the cracks or never happen because the only thing missing was one enterprising soul saying, “Hey, I spent 5 minutes on this shitty first draft version — anyone have feedback or thoughts?” so people could tear it apart and improve it. I had an experience last week of creating a system for making reference check phone calls and just watching myself with pride and satisfaction; like, “Whoa, I’m just building this out of nothing”. There’s something here that’s general, what I think is meant by “agency,” what I’d call “self-efficacy”—the belief, the felt belief, that you are one of the people who can just build things and make things happen, and here you go just doing it. That seems to me to be one of the best pieces of dogfood I’ve eaten at CFAR. It’s an effect that’s tricky to measure/quantify, but we think there’s good reason to believe it’s there).
I’m in a meeting and I hear, “Yeah, so let’s definitely take X action” and then silence or a change of topic —> Say, “Okay, so what’s going to happen next? Who’s owning this and when are we going to check in about whether it happened?” (Also insane. I’m shocked by the number of times that I have said “Oh no, we totally thought that was a good idea but never made sure it happened” and the only thing required to save the situation was the simple question above. This happens a lot less nowadays; one time a few weeks ago, both Adam and I had this exact reaction simultaneously in a meeting. It was awesome.)
Empiricism/tracking/something. Every time I go to the YMCA to swim, I log the time that I walk into the building and the time I walk out of the building. I started doing this because my belief that “I can get in and out for a swim in less than an hour” has persisted for months, in the face of consistent evidence that it actually just takes an hour or more every time (gotta get changed, shower, swim, shower, get changed—it adds up!), and has caused me stress and frustration. Earlier this year, Brienne and I spent some time working on noticing skills, working that into the mainline curriculum and our instructor training curriculum, as well as practicing in daily life. So I decided not to try to do anything different, just to observe—give myself a detailed, clear, entirely truthful picture of exactly how long I take at the Y. In the two months I’ve been doing this tracking, the average amount of time I spend in the Y has dropped by about 15 minutes. My perspective on “how much time it takes me to swim” has changed, too; from “God dammit, I’m taking too long but I really want to take my time working out!” to “Sometimes I want to have a nice, slow workout, sit in the hottub and take a while. Sometimes I want to move quickly so I can get shit done. I have the capacity to do both of those.” I care a little about the time, and a lot about the satisfaction, self-efficacy, and skill that came from giving myself the gift of more time simply by allowing some part of my system to update and change my behavior as a consequence of consistently providing myself with a clearer picture of this little bit of territory. That’s some sweet dog food.
(Just responding here to whether or not we dogfood.)
I always have a hard time answering this question, and nearby questions, personally.
Sometimes I ask myself whether I ever use goal factoring, or seeking PCK, or IDC, and my immediate answer is “no”. That’s my immediate answer because when I scan through my memories, almost nothing is labeled “IDC”. It’s just a continuous fluid mass of ongoing problem solving full of fuzzy inarticulate half-formed methods that I’m seldom fully aware of even in the moment.
A few months ago I spent some time paying attention to what’s going on here, and what I found is that I’m using either the mainline workshop techniques, or something clearly descended from them, many times a day. I almost never use them on purpose, in the sense of saying “now I shall execute the goal factoring algorithm” and then doing so. But if I snap my fingers every time I notice a feeling of resolution and clarification about possible action, I find that I snap my fingers quite often. And if, after snapping my fingers, I run through my recent memories, I tend to find that I’ve just done goal factoring almost exactly as it’s taught in mainlines.
This, I think, is what it’s like to fully internalize a skill.
I’ve noticed the same sort of thing in my experience of CFAR’s internal communication as well. In the course of considering our answers to some of these questions, for example, we’ve occasionally run into disagreements with each other. In the moment, my impression was just that we were talking to each other sensibly and working things out. But if I scan through a list of CFAR classes as I recall those memories, I absolutely recognize instances of inner sim, trigger-action planning, againstness, goal factoring, double crux, systemization, comfort zone exploration, internal double crux, pedagogical content knowledge, Polaris, mundanification, focusing, and the strategic level, at minimum.
At one point when discussing the topic of research I said something like, “The easiest way for me to voice my discomfort here would involve talking about how we use words, but that doesn’t feel at all cruxy. What I really care about is [blah]”, and then I described a hypothetical world in which I’d have different beliefs and priorities. I didn’t think of myself as “using double crux”, but in retrospect that is obviously what I was trying to do.
I think techniques look and feel different inside a workshop vs. outside in real life. So different, in fact, that I think most of us would fail to recognize almost every example in our own lives. Nevertheless, I’m confident that CFAR dogfoods continuously.
I don’t really know, because I’m not quite sure what CFAR’s values are as an organization, or what its extrapolated volition would count as satisfaction criteria.
My guess is “not much, not yet”. According to what I think it wants to do, it seems to me like its progress on that is small and slow. It seems pretty disorganized and flaily much of the time, not great at getting the people it most needs, and not great at inspiring or sustaining the best in the people it has.
I think it’s *impressively successful* given how hard I think the problem really is, but in absolute terms, I doubt it’s succeeding enough.
If it weren’t dogfooding, though, it seems to me that CFAR would be totally non-functional.
Why would it be totally non-functional? Well, that’s really hard for me to get at. It has something to do with what sort of thing a CFAR even is, and what it’s trying to do. I *do* think I’m right about this, but most of the information hasn’t made it into the crisp kinds of thoughts I can see clearly and make coherent words about. I figured I’d just go ahead and post this anyhow, and y’all can make or not-make what you want of my intuitions.
More about why CFAR would be non-functional if it weren’t dogfooding:
As I said, my thoughts aren’t really in such a state that I know how to communicate them coherently. But I’ve often found that going ahead and communicating incoherently can nevertheless be valuable; it lets people’s implicit models interact more rapidly (both between people and within individuals), which can lead to developing explicit models that would otherwise have remained silent.
So, when I find myself in this position, I often throw a creative prompt to the part of my brain that thinks it knows something, and don’t bother trying to be coherent, just to start to draw out the shape of a thing. For example, if CFAR were a boat, what sort of boat would it be?
If CFAR were a boat, it would be a collection of driftwood bound together with twine. Each piece of driftwood was yanked from the shore in passing when the boat managed to get close enough for someone to pull it in. The riders of the boat are constantly re-organizing the driftwood (while standing on it), discarding parts (both deliberately and accidentally), and trying out variations on rudders and oars and sails. All the while, the boat is approaching a waterfall, and in fact the riders are not trying to make a boat at all, but rather an airplane.
The CFAR techniques are first of all the driftwood pieces themselves, and are also ways of balancing atop something with no rigid structure, of noticing when the raft is taking on water, of coordinating about which bits of driftwood ought to be tied to which other bits, and of continuing to try to build a plane when you’d rather forget the waterfall and go for a swim.
Which, if I had to guess, is an impressionistic painting depicting my concepts around an organization that wants to bootstrap an entire community into equalling the maybe impossible task of thinking well enough to survive x-risk.
This need to quickly bootstrap patterns of thought and feeling, not just of individual humans but of far-flung assortments of people, is what makes CFAR’s problem so hard, and its meager success thus far so impressive to me. It doesn’t have the tools it needs to efficiently and reliably accomplish the day-to-day tasks of navigation and not sinking and so forth, so it tries to build them by whatever means it can manage in any given moment.
It’s a shitty boat, and an even shittier plane. But if everyone on it were just passively riding the current, rather than constantly trying to build the plane and fly, the whole thing would sink well before it reached the waterfall.
I think we eat our own dogfood a lot. It’s pretty obvious in meetings—e.g., people do Focusing-like moves to explain subtle intuitions, remind each other to set TAPs, do explicit double cruxing, etc.
As to whether this dogfood allows us to perform better—I strongly suspect so, but I’m not sure what legible evidence I can give about that. It seems to me that CFAR has managed to have a surprisingly large (and surprisingly good) effect on AI safety as a field, given our historical budget and staff size. And I think there are many attractors in org space (some fairly powerful) that would have made CFAR less impactful, had it fallen into them, that it’s avoided falling into in part because its staff developed unusual skill at noticing confusion and resolving internal conflict.
I’m reading the replies of current CFAR staff with great interest (I’m a former staff member who ended work in October 2018), as my own experience within the org was “not really; to some extent yes, in a fluid and informal way, but I rarely see us sitting down with pen and paper to do explicit goal factoring or formal double crux, and there’s reasonable disagreement about whether that’s good, bad, or neutral.”
All of these answers so far (Luke, Adam, Duncan) resonate for me.
I want to make sure I’m hearing you right though, Duncan. Putting aside the ‘yes’ or ‘no’ of the original question, do the scenes/experiences that Luke and Adam describe match what you remember from when you were here?
They do. The distinction seems to me to be something like endorsement of a “counting up” strategy/perspective versus endorsement of a “counting down” one, or reasonable disagreement about which parts of the dog food are actually beneficial to eat at what times versus which ones are Goodharting or theater or low payoff or what have you.
I wrote the following comment during this AMA back in 2019, but didn’t post it because of the reasons that I note in the body of the comment.
I still feel somewhat unsatisfied with what I wrote. I think something about the tone feels wrong, or gives the wrong impression, somehow. Or maybe this only presents part of the story. But it still seems better to say aloud than not.
I feel more comfortable posting it now, since I’m currently early in the process of attempting to build an organization / team that does meet these standards. In retrospect, I think probably it would have been better if I had just posted this at the time, and hashed out some disagreements with others in the org in this thread.
(In some sense this comment is useful mainly as bit of a window into the kind of standards that I, personally, hold a rationality-development / training organization to.)
My original comment is reproduced verbatim below (plus a few edits for clarity).
I feel trepidation about posting this comment, because it seems in bad taste to criticize a group, unless one is going to step up and do the legwork to fix the problem. This is one of the top 5 things that bothers me about CFAR, and maybe I will step up to fix it at some point, but I’m not doing that right now and there are a bunch of hard problems that people are doing diligent work to fix. Criticizing is cheap. Making things better is hard.
[edit 2023: I did run a year long CFAR instructor training that was explicitly designed to take steps on this class of problems though. It is not as if I was just watching from the sidelines. But shifting the culture of even a small org, especially from a non-executive role, is pretty difficult, and my feeling is that I made real progress in the direction that I wanted, but only about one twentieth of the way to what I would think is appropriate.]
My view is that CFAR does not meaningfully eat its own dogfood, or at least doesn’t enough, and that this hurts the organization’s ability to achieve its goals.
This is not to contradict the anecdotes that others have left here, which I think are both accurate presentations, and examples of good (even inspiring) actions. But while some members of CFAR do have personal practices (with varying levels of “seriousness”) in correct thought and effective action, CFAR, as an institution, doesn’t really make much use of rationality. I resonate strongly with Duncan’s comment about counting up vs. counting down.
More specific data, both positive and negative:
CFAR did spend some 20 hours of staff meeting time Circling in 2017, separately from a ~50 hour CFAR circling retreat the most of the staff participated in, and various other circling events that CFAR staff attended together (but were not “run by CFAR”).
I do often observe people doing Focusing moves and Circling moves in meetings.
I have observed occasional full explicit Double Crux conversations on the order of three or four times a year.
I frequently (on the order of once every week or two) observe CFAR staff applying the Double Crux moves (offering cruxes, crux checking, operationalizing, playing the Thursday-Friday game) in meetings and in conversation with each other.
Group goal-factoring has never happened, to the best of my knowledge, even though there are a number of things that happen at CFAR that seem very inefficient, seem like “shoulds”, or are frustrating / annoying to at least one person [edit 2023: these are explicit triggers for goal factoring]. I can think of only one instance in which two of us (Tim and I, specifically) tried to goal-factor something (a part of meetings that some of us hate).]
We’ve never had an explicit group pre-mortem, to the best of my knowledge. There is the occasional two-person session of simulating a project (usually workshop or workshop activity), and the ways in which it goes wrong. [edit 2023: Anna said that she had participated in many long form postmortems regarding hiring in particular, when I sent her a draft of this comment in 2019.]
There is no infrastructure for tracking predictions or experiments. Approximately, CFAR as an institution doesn’t really run [formal] experiments, at least experiments with results that are tracked by anything other than the implicit intuitions of the staff. [edit 2023: some key features of a “formal experiment” as I mean it are writing down predictions in advance, and having a specific end date at which the group reviews the results. This is in contrast to simply trying new ideas sometimes.]
There is no explicit processes for iterating on new policies or procedures (such as iterating on how meetings are run).
[edit 2023: An example of an explicit process for iterating on policies and procedures is maintaining a running document for a particular kind of meeting. Every time you have that kind of meeting, you start by referring to the notes from the last session. You try some specific procedural experiments, and then end the meeting with five minutes of reflection on what worked well or poorly, and log those in the document. This way you are explicitly trying new procedures and capturing the results, instead finding procedural improvements mainly by stumbling into them, and often forgetting improvements rather than integrating and building upon them. I use documents like this for my personal procedural iteration.
Or in Working Backwards, the authors describe not just organizational innovations that Amazon came up with to solve explicitly-noted organizational problems, but the sequence of iteration that led to those final form innovations.]
There is informal, but effective, iteration on the workshops. The processes that run CFAR’s internals however, seem to me to be mostly stagnant [edit 2023: in the sense that there’s not deliberate intentional effort on solving long-standing institutional frictions, or developing more effective procedures for doing things.]
As far as I know, there are no standardized checklists for employing CFAR techniques in relevant situations (like starting a new project). I wouldn’t be surprised if there were some ops checklists with a murphyjitsu step. I’ve never seen a checklist for a procedure at CFAR, excepting some recurring shopping lists for workshops.
The interview process does not incorporate the standard research about interviews and assessment contained in Thinking, Fast and Slow. (I might be wrong about this. I, blessedly, don’t have to do admissions interviews.)
No strategic decision or choice to undertake a project, that I’m aware of, has involved quantitative estimates of impact, or quantitative estimates of any kind. (I wouldn’t be surprised if the decision to run the first MSFP did, [edit 2023: but I wasn’t at CFAR at the time. My guess is that there wasn’t.])
Historically, strategic decisions were made to a large degree by inertia. This is more resolved now, but for a period of several years, I think most of the staff didn’t really understand why we were running mainlines, and in fact when people [edit 2023: workshop participants] asked about this, we would say things like “well, we’re not sure what else to do instead.” This didn’t seem unusual, and didn’t immediately call out for goal factoring.
There’s not designated staff training time for learning or practicing the mental skills, or for doing general tacit knowledge transfer between staff. However, Full time CFAR staff have historically had a training budget, which they could spend on whatever personal development stuff they wanted, at their own discretion.
CFAR does have a rule that you’re allowed / mandated to take rest days after a workshop, since the workshop eats into your weekend.
Overall, CFAR strikes me as a mostly a normal company, populated by some pretty weird hippy-rationalists. There aren’t any particular standards that the employees are expected to use rationality techniques, nor institutional procedures for doing rationality [edit 2023: as distinct from having shared rationality-culture].
This is in contrast to say, Bridgewater associates, which is clearly structured intentionally to enable updating and information processing, on the organizational level. (Incidentally, Bridgewater is rich in the most literal sense.)
Also, I’m not fully exempt from these critiques myself: I have not really internalized goal factoring, yet, for instance, and think that I personally, am making the same kind of errors of inefficient action that I’m accusing CFAR of making. I also don’t make much use of quantitative estimates, and I have lots of empirical iteration procedures, but haven’t really gotten the hang of doing explicit experiments. (I do track decisions and predictions though, for later review.)
Overall, I think this gap is about due 10% “these tools don’t work as well, especially at the group level, as we seem to credit them, and we are correct to not use them”, about 30% to this being harder to do than it seems, and about 60% due to CFAR not really trying at this (and maybe it shouldn’t be trying at this, because there are trade offs and other things to focus on).
Elaborating on the 30%: I do think that making an org like this, especially when not starting from scratch, is deceptively difficult. I think that while implementing some of these seems trivial on the surface, but that it actually entails a shift in culture and expectations, and doing this effectively requires leadership and institution-building skills that CFAR doesn’t currently have. Like, if I imagine something like this existing, it would need to have a pretty in depth onboarding process for new employees, teaching the skills, and presenting “how we do things here.” If you wanted to bootstrap into this kind of culture, at anything like a fast enough speed, you would need the same kind of on-boarding for all of the existing employees, but it would be even harder, because you wouldn’t have the culture already going to provide example and immersion.
Does CFAR “eat its own dogfood”? Do the cognitive tools help in running the organization itself? Can you give concrete examples? Are you actually outperforming comparable organizations on any obvious metric due to your “applied rationality”? (Why ain’tcha rich? Or are you?)
A response to just the first three questions. I’ve been at CFAR for two years (since January 2018). I’ve noticed, especially during the past 2-3 months, that my mind is changing. Compared to a year, or even 6 months ago, it seems to me that my mind more quickly and effortlessly moves in some ways that are both very helpful and resemble some of the cognitive tools we offer. There’s obviously a lot of stuff CFAR is trying to do, and a lot of techniques/concepts/things we offer and teach, so any individual’s experience needs to be viewed as part of a larger whole. With that context in place, here are a few examples from my life and work:
Notice the person I’m talking to is describing a phenomenon but I can’t picture anything —> Ask for an example (Not a technique we formally teach at the workshop, but seems to me like a basic application of being specific. In that same vein: while teaching or explaining a concept, I frequently follow a concrete-abstract-concrete structure.)
I’m making a plan —> Walk through it, inner sim / murphyjitsu style (I had a particularly vivid and exciting instance of this a few weeks ago: I was packing my bag for a camping trip and found myself, without having explicitly set the intention to do so, simulating the following day with an eye for what I would need. I noticed that I would need my camping spork, which I hadn’t thought of before, and packed it! Things like this happen regularly when planning workshops and doing my day-to-day CFAR work, in addition to my personal life.)
I’m willing to make trades of money for time, like paying for Ubers or shorter flights or services, etc. I used to have a pretty rigid deontological rule for myself against ever spending money when I “didn’t need to,” which I believe now was to my own detriment. I think internalizing Units of Exchange and Goal Factoring and sunk cost and willingness to pay lead me to a) more clearly see how much I value my time, and b) acquire felt “permission” (from myself) to make trades that previously seemed weird or unacceptable, to great benefit to myself and others. I notice this shift when I’m doing ops work with people outside of CFAR and they say something like, “No, it’s just impossible to carpet the entire floor. The venue doesn’t want us to, and besides it would cost a ton of money.” And I say something like, “Well, we really value having a cozy, welcoming, comfortable space for participants, and we’d probably be willing to pay quite a bit for that. What if we had $2k to spend on it—could we do it then?” and they say, “What… I mean, I guess…” Or I’m talking to a friend about paying for membership in a pickup basketball league and I say, “So I might miss one or two of the five games, but I’d gladly pay the full-season price of $130 to play even just two or three really good pickup games, so I’m definitely in.” and he responds with something like, “Huh, well I didn’t think about it like that but I guess it’s worth that much to me…” I feel excited at what seems to me like more freedom for myself in this area. Some good dogfood.
Something needs doing in the org and I have a vision for how to do it —> Just create the 80-20 version of my proposal in 5 minutes and run with that. (This one is insane. It’s wild to me how many things fall through the cracks or never happen because the only thing missing was one enterprising soul saying, “Hey, I spent 5 minutes on this shitty first draft version — anyone have feedback or thoughts?” so people could tear it apart and improve it. I had an experience last week of creating a system for making reference check phone calls and just watching myself with pride and satisfaction; like, “Whoa, I’m just building this out of nothing”. There’s something here that’s general, what I think is meant by “agency,” what I’d call “self-efficacy”—the belief, the felt belief, that you are one of the people who can just build things and make things happen, and here you go just doing it. That seems to me to be one of the best pieces of dogfood I’ve eaten at CFAR. It’s an effect that’s tricky to measure/quantify, but we think there’s good reason to believe it’s there).
I’m in a meeting and I hear, “Yeah, so let’s definitely take X action” and then silence or a change of topic —> Say, “Okay, so what’s going to happen next? Who’s owning this and when are we going to check in about whether it happened?” (Also insane. I’m shocked by the number of times that I have said “Oh no, we totally thought that was a good idea but never made sure it happened” and the only thing required to save the situation was the simple question above. This happens a lot less nowadays; one time a few weeks ago, both Adam and I had this exact reaction simultaneously in a meeting. It was awesome.)
Empiricism/tracking/something. Every time I go to the YMCA to swim, I log the time that I walk into the building and the time I walk out of the building. I started doing this because my belief that “I can get in and out for a swim in less than an hour” has persisted for months, in the face of consistent evidence that it actually just takes an hour or more every time (gotta get changed, shower, swim, shower, get changed—it adds up!), and has caused me stress and frustration. Earlier this year, Brienne and I spent some time working on noticing skills, working that into the mainline curriculum and our instructor training curriculum, as well as practicing in daily life. So I decided not to try to do anything different, just to observe—give myself a detailed, clear, entirely truthful picture of exactly how long I take at the Y. In the two months I’ve been doing this tracking, the average amount of time I spend in the Y has dropped by about 15 minutes. My perspective on “how much time it takes me to swim” has changed, too; from “God dammit, I’m taking too long but I really want to take my time working out!” to “Sometimes I want to have a nice, slow workout, sit in the hottub and take a while. Sometimes I want to move quickly so I can get shit done. I have the capacity to do both of those.” I care a little about the time, and a lot about the satisfaction, self-efficacy, and skill that came from giving myself the gift of more time simply by allowing some part of my system to update and change my behavior as a consequence of consistently providing myself with a clearer picture of this little bit of territory. That’s some sweet dog food.
Hope this gives you something to chew on ;)
(Just responding here to whether or not we dogfood.)
I always have a hard time answering this question, and nearby questions, personally.
Sometimes I ask myself whether I ever use goal factoring, or seeking PCK, or IDC, and my immediate answer is “no”. That’s my immediate answer because when I scan through my memories, almost nothing is labeled “IDC”. It’s just a continuous fluid mass of ongoing problem solving full of fuzzy inarticulate half-formed methods that I’m seldom fully aware of even in the moment.
A few months ago I spent some time paying attention to what’s going on here, and what I found is that I’m using either the mainline workshop techniques, or something clearly descended from them, many times a day. I almost never use them on purpose, in the sense of saying “now I shall execute the goal factoring algorithm” and then doing so. But if I snap my fingers every time I notice a feeling of resolution and clarification about possible action, I find that I snap my fingers quite often. And if, after snapping my fingers, I run through my recent memories, I tend to find that I’ve just done goal factoring almost exactly as it’s taught in mainlines.
This, I think, is what it’s like to fully internalize a skill.
I’ve noticed the same sort of thing in my experience of CFAR’s internal communication as well. In the course of considering our answers to some of these questions, for example, we’ve occasionally run into disagreements with each other. In the moment, my impression was just that we were talking to each other sensibly and working things out. But if I scan through a list of CFAR classes as I recall those memories, I absolutely recognize instances of inner sim, trigger-action planning, againstness, goal factoring, double crux, systemization, comfort zone exploration, internal double crux, pedagogical content knowledge, Polaris, mundanification, focusing, and the strategic level, at minimum.
At one point when discussing the topic of research I said something like, “The easiest way for me to voice my discomfort here would involve talking about how we use words, but that doesn’t feel at all cruxy. What I really care about is [blah]”, and then I described a hypothetical world in which I’d have different beliefs and priorities. I didn’t think of myself as “using double crux”, but in retrospect that is obviously what I was trying to do.
I think techniques look and feel different inside a workshop vs. outside in real life. So different, in fact, that I think most of us would fail to recognize almost every example in our own lives. Nevertheless, I’m confident that CFAR dogfoods continuously.
So, is CFAR rich?
I don’t really know, because I’m not quite sure what CFAR’s values are as an organization, or what its extrapolated volition would count as satisfaction criteria.
My guess is “not much, not yet”. According to what I think it wants to do, it seems to me like its progress on that is small and slow. It seems pretty disorganized and flaily much of the time, not great at getting the people it most needs, and not great at inspiring or sustaining the best in the people it has.
I think it’s *impressively successful* given how hard I think the problem really is, but in absolute terms, I doubt it’s succeeding enough.
If it weren’t dogfooding, though, it seems to me that CFAR would be totally non-functional.
Why would it be totally non-functional? Well, that’s really hard for me to get at. It has something to do with what sort of thing a CFAR even is, and what it’s trying to do. I *do* think I’m right about this, but most of the information hasn’t made it into the crisp kinds of thoughts I can see clearly and make coherent words about. I figured I’d just go ahead and post this anyhow, and y’all can make or not-make what you want of my intuitions.
More about why CFAR would be non-functional if it weren’t dogfooding:
As I said, my thoughts aren’t really in such a state that I know how to communicate them coherently. But I’ve often found that going ahead and communicating incoherently can nevertheless be valuable; it lets people’s implicit models interact more rapidly (both between people and within individuals), which can lead to developing explicit models that would otherwise have remained silent.
So, when I find myself in this position, I often throw a creative prompt to the part of my brain that thinks it knows something, and don’t bother trying to be coherent, just to start to draw out the shape of a thing. For example, if CFAR were a boat, what sort of boat would it be?
If CFAR were a boat, it would be a collection of driftwood bound together with twine. Each piece of driftwood was yanked from the shore in passing when the boat managed to get close enough for someone to pull it in. The riders of the boat are constantly re-organizing the driftwood (while standing on it), discarding parts (both deliberately and accidentally), and trying out variations on rudders and oars and sails. All the while, the boat is approaching a waterfall, and in fact the riders are not trying to make a boat at all, but rather an airplane.
The CFAR techniques are first of all the driftwood pieces themselves, and are also ways of balancing atop something with no rigid structure, of noticing when the raft is taking on water, of coordinating about which bits of driftwood ought to be tied to which other bits, and of continuing to try to build a plane when you’d rather forget the waterfall and go for a swim.
Which, if I had to guess, is an impressionistic painting depicting my concepts around an organization that wants to bootstrap an entire community into equalling the maybe impossible task of thinking well enough to survive x-risk.
This need to quickly bootstrap patterns of thought and feeling, not just of individual humans but of far-flung assortments of people, is what makes CFAR’s problem so hard, and its meager success thus far so impressive to me. It doesn’t have the tools it needs to efficiently and reliably accomplish the day-to-day tasks of navigation and not sinking and so forth, so it tries to build them by whatever means it can manage in any given moment.
It’s a shitty boat, and an even shittier plane. But if everyone on it were just passively riding the current, rather than constantly trying to build the plane and fly, the whole thing would sink well before it reached the waterfall.
I think we eat our own dogfood a lot. It’s pretty obvious in meetings—e.g., people do Focusing-like moves to explain subtle intuitions, remind each other to set TAPs, do explicit double cruxing, etc.
As to whether this dogfood allows us to perform better—I strongly suspect so, but I’m not sure what legible evidence I can give about that. It seems to me that CFAR has managed to have a surprisingly large (and surprisingly good) effect on AI safety as a field, given our historical budget and staff size. And I think there are many attractors in org space (some fairly powerful) that would have made CFAR less impactful, had it fallen into them, that it’s avoided falling into in part because its staff developed unusual skill at noticing confusion and resolving internal conflict.
I’m reading the replies of current CFAR staff with great interest (I’m a former staff member who ended work in October 2018), as my own experience within the org was “not really; to some extent yes, in a fluid and informal way, but I rarely see us sitting down with pen and paper to do explicit goal factoring or formal double crux, and there’s reasonable disagreement about whether that’s good, bad, or neutral.”
All of these answers so far (Luke, Adam, Duncan) resonate for me.
I want to make sure I’m hearing you right though, Duncan. Putting aside the ‘yes’ or ‘no’ of the original question, do the scenes/experiences that Luke and Adam describe match what you remember from when you were here?
They do. The distinction seems to me to be something like endorsement of a “counting up” strategy/perspective versus endorsement of a “counting down” one, or reasonable disagreement about which parts of the dog food are actually beneficial to eat at what times versus which ones are Goodharting or theater or low payoff or what have you.
I wrote the following comment during this AMA back in 2019, but didn’t post it because of the reasons that I note in the body of the comment.
I still feel somewhat unsatisfied with what I wrote. I think something about the tone feels wrong, or gives the wrong impression, somehow. Or maybe this only presents part of the story. But it still seems better to say aloud than not.
I feel more comfortable posting it now, since I’m currently early in the process of attempting to build an organization / team that does meet these standards. In retrospect, I think probably it would have been better if I had just posted this at the time, and hashed out some disagreements with others in the org in this thread.
(In some sense this comment is useful mainly as bit of a window into the kind of standards that I, personally, hold a rationality-development / training organization to.)
My original comment is reproduced verbatim below (plus a few edits for clarity).