Rationality Research Report: Towards 10x OODA Looping?

Raemon24 Feb 2024 21:06 UTC

112 points

6 months ago I wrote Feedbackloop-first Rationality. I didn’t followup on it for awhile (except for sporadic Deliberate (“Purposeful?”) Practice Club).

I just spent 6 weeks actually exploring “how would I build my own cognition training program?”. In the process of doing so, I’ve iterated a bunch. I’m still in an orienting phase, but it seemed worth writing down the current stage of my thoughts.

What’s my goal?

A rough overview:

I want to get more, higher quality “X-risk thinker hours” hours.
- This includes AI alignment technical research, AI macrostrategy research, policy, governance, as well as people (such as Lightcone team) deciding which infrastructure to build,
I’m particularly interested in getting more “serial research”, as opposed to more “parallel research.” We can throw more researchers at a problem, but if there are some problems that require one person to synthesize 10+ years of experience, all the parallel research won’t help.
An obvious way to improve researcher hours is “via mentorship”, but I think there is a mentorship bottleneck. So, I’m interested in strategies that train tacit cognitive skills that either don’t require mentorship, or leveraging expertise from outside the current x-risk ecosystem.

This is all parented under the higher level goal of “contribute meaningfully to x-risk reduction”, but it feels relevant/meaty enough to be worth running at this goal for awhile.

“Rationality for the sake of existential risk”

A part of me romantically wants to pursue “rationality training for rationality training’s sake.” Alas, the world is big and my time is limited and I just can’t actually justify putting years of effort into something, if I didn’t think it would help with x-risk.

CFAR went through a phase where (some leaders) framed things as:

“Rationality, for the sake of rationality, for the sake of existential risk.”

i.e. try to earnestly build something rationality-focused for it’s own sake, because that seemed both healthier and better for x-risk than “rationality for the sake of x-risk”, directly.

I think this was a reasonable thing to try, but my impression is this didn’t work that well. If you tell yourself (and your students) “I’m doing this for the sake of rationality itself”, but then in practice you’re getting people to delicately open up their soul and figure out their true goals… and all-the-while radiating “man I really hope your goals turn out to involve saving the worlds from AIs”, that may fuck up the “earnestly try to figure out your goals” process.

So:

I am not here to help you earnestly figure out your goals. That’s an important part of rationality, and it might come about incidentally while people do exercises I develop, but it’s not what I’m focused on this year.

I am here to develop and teach cognitive skills, which help you solve confusing problems at the edge of your ability. I’m doing this to push forward humanity’s frontier of “how quickly can we do challenging research?”, and strive towards 10x science.

I will prioritize learning and teaching those skills to people who seem like they are going to help with x-risk somehow, but I aim to write up a lot of stuff publicly, and trying-where-possible to output exercises that other people can do on their own, for whatever reasons they want. (See Exercise: Solve “Thinking Physics” as an example)

The Story So Far

Feedback-loops and “deliberate practice”, vs “Just Clicking”

I just spent a month workshopping various “teaching rationality” plans. My initial ideas were framed around:

Deliberate practice is costly and kinda sucks
Therefore, people haven’t invested in it much, as either “rationality training programs”, or as an “alignment research training programs.”
Therefore, there may be opportunity to build useful training programs, premised on the notion of “actually put in the work to do the practice”.

i.e. “Deliberate practice kinda sucks, thus it’s undervalued, thus there’s alpha in it.”

I still believe this. But.… I do grudgingly admit to myself that deliberate practice is, like, really costly, and sucks a lot. It’s exhausting, and it seems (at least for me) to require coming out of my peak hours of the day, trading off directly against my day job. It’s frustrating and easy to bounce off of. It took me 30-50 hours over the course of months to get noticeably better at a videogame. It took me 40 hours over 2 weeks to get noticeably better at Thinking Physics exercises.

I think we could optimize the pedagogy here. The thing that separates actual “deliberate practice” from “regular practice” is that it’s been battle tested and found to actually quickly move you to the frontier of expertise. But this still seems like a long, effortful project, so it seems worth asking:

Can we find cognitive skills that just click, rather than requiring dozens of hours of practice, that still provide a major cognitive edge?

What About CFAR? Didn’t they teach “just click” skills?

You might ask “how does this relate to the Center for Applied Rationality and all the stuff they did?”. In particular, CFAR taught a bunch of stuff in a four day workshop. Shouldn’t that stuff have been aimed at “things that just click?”. What’s my new angle here?

I think the mechanism of CFAR was something like:

“Create a transformative workshop environment. Throw a lot of different tools/skills/ideas at people in one weekend. Most tools/skills/ideas won’t necessarily help most people, but each person hopefully finds 1-3 tools that are immediately useful, which gives them a a sense that more is possible. And meanwhile the workshop conveys an overall mindset of systematically/agentically solving your problems.”

I’m currently aiming at something more like:

“Convey a tightly clustered set of skills that weave into one ‘deeper skill’, over the course of 1-2 weeks. Then, build a good followup environment, where people who attended the workshop reliably get practice/checkin session once a week, for the next 6-12 months, to ensure those skills actually permeate their life.”

Hamming-nature, 10x plans, OODA Loops

One skill-cluster seemed noteworthy in that:

I think someone could learn it in ~a week, if they had the right prerequisites.
I think there exist people who are smart and capable, but nonetheless don’t have this skill (or, could use to further improve at the skill).
It’d be immediately really useful, instead of taking like 6 months or practice.

That skill is: making plans that are 10x better than your current plans. (And, ideally, have a habit of doing this, such that your plans end up 100-1000x better overall).

I mean “plans” in a pretty broad sense. I think it includes going down a research path, launching a product, deciding to go-to-school and get-a-job, etc.

I could simplify the process down to:

Generating multiple plans that you feel reasonably excited about.
Noticing the ways that the best plans don’t actually work, or could be dramatically improved. Iterate on them until they’re the best version of themselves.
Estimating the value of those plans.
Actually shifting away from your current favorite to a plan that you think is 10x better than it.
Having the judgment to either persevere with that plan when it gets hard, or, pivot again.

The “actually pivot away from current favorite plan” is perhaps the hardest part. It may require grieving important parts of your current favorite, which the new plan won’t accomplish. But I think the most important step is “actually have multiple alternative plans that you believe in.”^[1] This makes pivoting more natural, less painful.

This is related to asking yourself The Hamming Question (“what’s the most important problem in your field (or, life) and why aren’t you working on it?”). But it’s somewhat broader. I think “could I 10x my plans?” can be useful frame even if you feel averse to “what’s literally the most important problem I could focus on?”. And even if you have set your target on The Most Important Problem, asking “okay but can I do this 10x faster or better?” is still a useful question to ask.

“Planning” vs “OODA Loops”

The direction I’m currently exploring is “Okay, but planning is actually only one facet of a complete decisionmaking loop. Can I learn myself, and can I teach others, the full-stack skillset of a competent OODA Loop?”.

I currently feel a bit confused about this. I feel like I have a clear vision of how to improve at planmaking. (Or at least, what next things to try). I feel a lot fuzzier on how the various Observe/Orient/Decide/Act steps fit together into a cohesive skillset, and how to teach it.

My explorations so far have demonstrated “man, people come into this with all kinds of different skill gaps here, and I’m not sure how to build a single program that would teach it reliably.”

But, when I imagine just trying to teach the “10x planning” workshop, I imagine people… making some better plans, and becoming temporarily better at planning, and then… sort of forgetting about it and moving on. I feel like “the pedagogical work isn’t done” until it’s somehow collectively taught the full OODA process, in a way that repeats.

My Process: “Test Driven Development”

My methods here still route through the sorts of exercises I was imagining when I wrote Feedbackloop-first Rationality. But I now have a bit more of a skeleton of “how to design exercises that teach particular skills, which build into an immediately valuable skill.”

My process involves interleaving:

~3 hour exercises that have a clear “right answer”, but which require you to wrestle with gnarly confusing problems on your own. You get some guidance of how to approach the problem, but in a major component is almost always “figure out how to generate solutions on your own, and then reflect on which solutions actually worked.”
Longer sessions where you apply the skills from those exercises on real life problems.

An important component is that the 3-hour exercises are in domains that are as different from each other as possible. So you’re not merely learning “a skill”, you’re learning “how to generate solutions to novel problems.”

For example, you might train on making “a plan” in a simplified videogame environment, and then go through multiple OODA loops as you implement that plan. Then, go to design real life plans for your real life goals, which refer back to the skills that from the simplified exercise.

This aims to build up the skill of transferring knowledge from one domain to another.

Alternate Strategies and/or Theories of Change

Obviously, if I’m taking “10x planning seriously” I should be applying it to myself. If I’m not ending up conceiving of (and actually pivoting to) plans that are 10x better than what I started with, why should I expect my process is any good?

The “Teach 10x Planning in a week + months of weekly followup sessions” seems much more likely to be work, and time-efficient, than my previous BATNA of “brute force deliberate practice.” But my current process involves having 3-10 alternate plans that feel like real contenders, and periodically iterating on each of them as I learn more.

Here are my current contenders for alternate approaches. Some of these are “plans” and some of them are more like “useful project outputs” that aren’t quite plan-shaped.

#1: Help senior researchers with specific targeted problems.

When I started this project, I assumed “the best researchers” wouldn’t need my help with metacognitive skills. I saw clear gaps in junior and mid-level researchers, but the researchers who produce the work I’m most excited about seemed to have pretty good cognitive strategies, or at least a mysterious process I was afraid to mess around with.

My current guess is that this is largely true, but also it now seems to me that while senior researchers are “good at” metacognition, it’s usually not the thing they’re specializing in. There’s a lot of depth to metacognition that’s just hard to master and apply, and keeping track of all the options that have floated outside their context window is difficult.

I think the best time to try helping a senior researcher with metacognition is when something has recently, obviously gone wrong, so that a) the researcher believes it’s worth investigating their process, and b) there’s a clear object-level example to talk about.

I’m not sure how to scale this, and I’d expect each senior researcher to have pretty unique problems and psychologies, so for now this is more like something I’ll opportunistically seize upon rather than aggressively pursue, but I do think it might be much more cost-effective insofar as it’s tractable

My current tool here is applying the 5 Whys technique from Lean Startup Methodology to “research process failures.” (an important variation is that I think it’s usually necessary to do 6-7 whys instead of 5, because the 6th or 7th tend to be the place where “a root rationality failure” happened, and 5 Whys was designed more to deal with physical process failures)

#2: Build a ‘Thinking Assistant’ Pipeline

One way to improve people’s research output is to hire fulltime assistants. There’s a few different flavors of this in ascending “skill requirements.”

Body Doubles is a low-ish skill position of “sit next to someone while they work, and notice if they are getting distracted, encouraging the person to stay on track rather than bouncing off things that are hard or aversive.” Focusmate is a maximally cheap version of this, but IMO it’s easy to slide out of the habit of it, and it can feel somewhat less “real.”
Rubber Duck. Similar to Body Double but the researcher is constantly talking out loud about their thought process. In many cases the Rubber Duck may need enough technical background to follow the conversation.
Metacognitive Assistants have the explicit job of tracking your attention, your goals, your metacognitive habits. They keep track of things that have fallen out of your strategic context window. (“Secretary”/”Executive Assistants” often play many of these roles, in addition to basically being a personal assistant who also just deal with various other problems so you don’t have to. I’m imagining a version that specializes in improving your research output)
Research Assistant/Apprentice. This is a more involved role where you’re deeply embedding someone in your research thought process, training them in your paradigm.

I’ve heard a mixture of success stories and failure stories about each of these. I think there’s an important “matchmaking” element here, such that the assistant feels helpful rather than annoying.

One role that Thinking Assistants can play is “help prototype apps that can eventually be ‘AI-assisted alignment research’ tools.” A lot of LLM technology is not yet powerful enough to help augment a researcher’s thought process reliably, but they might later work, and meanwhile you can prototype the experience using a skilled human.

This entire thread can relate to the previous “help particular senior researchers with particular problems” thread – I can imagine meeting with a senior researcher to discuss their problems, and in some cases it might turn out that hiring some kind of assistant is a good longterm solution.

#3. Learning “Generalized Research Taste”

“10x planning” and “10x OODA looping” feel like my most tractable idea. But another major thread I’ve been following is asking “is there a generalized skill of ‘research taste’”, which transfers across domains?

I’m interested in this because there’s a lot of disagreement about what counts as “real” alignment research. Programs like MATS can match junior researchers up with mentors, to gain research taste in particular domains like Agent Foundations, Interpretability, Evals, Model Organisms, etc. This might help a junior researcher skill up and make contributions in a particular domain.

But, how do you decide which domain to specialize in in the first place? How do you figure out if you should pivot or adapt your domain, later?

I have some hopes that there turns out to be a skill of either...

rapidly gaining research taste in multiple domains, and then cross referencing them against each other
learning the skill of generating research taste from first principles, testing that it works, and and then applying that skill to the field of alignment, such that you have some reason to think your taste will be any good.

Chris Olah has explored some exercises for developing research taste that seem like useful stepping stones here.

The sort of plan I’m imagining here is:

We get multiple experts in different fields with subtle taste, where it’s well established what expertise looks like. (These can be random fields, although it’s helpful if they are at least adjacent to plausible-AI-alignment cognitive work)
The experts design questions like “in this situation, what would you do? What do you think would happen next in the situation?”, and write up lists of tastes/principles they actually follow.
Aspiring “general research-taste havers” look at each exercise, attempting to use general reasoning skills to get the right answer, and reflect on why they got the answers right or wrong. They also attempt to generate principles to follow from, well, first principles, and see how many they correctly identify.
Between each exercise, reflect on how they could have arrived at the right answer.

The hope is that after doing that in a bunch of fields with different constraints, they’ll have some kind of feel for “which sort of intuitions generalize” and which don’t, and when they approach the overall field of “somehow design AIs that scale safely to superintelligence”, they’ll have reasonable intuitions for navigating between agent foundations, interpretability, control, etc.

This agenda feels cool to me, but currently I grudgingly admit to myself that this would take a hella long time and not obviously work that well.

I think some portions of it are still a good idea to build out for individual research domains. (i.e. Chris Olah’s exercises seem like good things to do in whatever domain you end up specializing in)

#4. Filtering/enculturation for “Overall Community Epistemic Health”

I think a valuable service CFAR provided was “creating a recruitment/filtering/enculturation pipeline”, which resulted in a large cohort of people able to think sanely about important topics. This is notably different from “train rationality skills”, it’s more of a soft nudge on the overall ecosystem culture.

I would not feel comfortable directly optimizing for this goal. It feels pretty easy to delude yourself about. I like that most of my ideas here involve concrete tests for “you should be able to see people tackling an array of harder and harder problems in different domains.”

But I still feel like this is a gap in the current ecosystem. When I imagine pivoting entirely to “help individual good researchers” and “train/deploy thinking assistants”, I feel a sadness about giving up on the part of this project that seemed likely to help the broader community culture. I feel unsure how to weigh this, but I do weight it non-zero.

#5. Investigating “s factor?”

This is less of “a plan” and more of “a model”, but, something that’s really weirded me out about the literature on IQ, transfer learning, etc, is that… it seems like it’s just really hard to transfer learn. We’ve basically failed to increase g, and the “transfer learning demonstrations” I’ve heard of seemed pretty weaksauce.

But, all my common sense tells me that “general strategy” and “responding to novel information, and updating quickly” are learnable skills that should apply in a lot of domains.

My current model is: IQ tests are designed to test competence quickly, and they typically give you a barrage of questions that you only have a couple minutes for, max. They test which people have the raw horsepower to process information quickly and respond on the fly. It makes sense if that’s fairly hardwired and hard to improve on.

But, it seems to me that in order for strategy/general-creativity training to matter, it needs to operate on problems large enough that “planning” is an important subcomponent.

Hypothetically, it seems like you could construct an IQ-ish test, where the questions are expected to take a smart person at least an hour, and where the domain of each question is different so it’s hard to train for. My implicit model is something like “in addition to g factor, there’d turn out to be an ‘s factor’ (i.e. “slow intelligence”) that is a product of both “g” and “general reasoning skills.”

This seems very expensive to test and Do Science To. I think it’d be cool if humanity overall was working on designing longrunning experiments or longitudinal studies around, but I don’t think it’s competitive enough as an “x-risk intervention.”

It’d be cool if a second group also worked towards “rationality skill assessment.”

I’m currently trying bootstrap both “a training program” and “an evaluation process.” They both seem necessary. I’m not sure if I’m going to end up sticking with my “Test Driven Development”, but I put moderate odds on that.

But, in 3 Levels of Rationality Verification, Eliezer notes:

This question of “verification methods good enough to build organizations,” is a huge problem at all levels of modern human society.
If you’re going to use the SAT to control admissions to elite colleges, then can the SAT be defeated by studying just for the SAT in a way that ends up not correlating to other scholastic potential? If you give colleges the power to grant degrees, then do they have an incentive not to fail people?
(I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let’s not go into that.)

If I’m building my own training and tests, there’s always the risk of ending up “teaching to the test”, even if unintentionally. I think it’d be cool if other people were working on “Holdout Questions From Holdout Domains”, that I don’t know anything about, so that it’s possible to test if my programs actually output people who are better-than-baseline (controlling for IQ).

This could be something like “TripleByte for Reasoning Skills”, and it’s primary role might be something like “a place that orgs can outsource difficult interview questions to” for hiring.

What Have I Actually Done?

That was a lot of philosophy. Here’s what actually happened:

I focused on this while that MATS program was running at Lighthaven (where I work). MATS scholars seemed like a good potential target audience.

Things I ended up doing:

Experimented with Toybox Exercises

Ran a one day “Basic Metacognition” workshop (based on Exercise: Solve “Thinking Physics”)
Had a followup 1-1 workshops with 3 MATS scholars, doing the Planmaking and Surprise-Anticipation exercise.
Experimented with GPQA questions, which are hard problems written by grad students in physics, chemistry and biology. (where, for example, a chemist major wouldn’t be reliably get the answer to a physics or biology question in 30 minutes even with google).
Eventually hashed out the “multi-hour, multi-domain confusing-problem test” as the benchmark to be shooting for.
Experimented with an exercise where people had to find a bug in a small codebase, without running the code.
Experimented with an exercise applying “OODA loops” to the game Patrick’s Parabox.

Experimented with “make and compare plans, for real”

So far done with myself, Eli Tyre, and Robin Goins
This seems to depend a lot on where people are starting from
Involves:
- figure out what your goals are
- make at least 5 plans that can achieve those goals
- reflect on the assumptions in each plan
- try to do a fermi estimate on the value of each plan
- iterate on the plans

Experimented with “prediction mindset”

I’m trying out “make lots of predictions about my project and thought processes.” I think this might evolve into an important skill, although it’s not there yet.
It was bottlenecked on: “it’s hard to make predictions.” It was high friction to open up Fatebook.io, it was hard to operationalize predictions that mattered, and it was hard to make predictions about my thought process without disrupting my thought process.
I made progress via:
- Discovering the fatebook chrome extension which makes it much easier to jot quick predictions down in whatever program I’m in.
- Establishing a TAP (trigger-action-plan) for “notice I just had an insight that feels ‘promising’” → “write down PROMISING” immediately, in my notes.” I can come back to flesh out why it felt promising, and how to make predictions about it, later when I finish my thought process.
- Experimenting with “write how a prediction felt rather than giving it a number”
Demonstration of Fatebook Chrome Extension. I notice I haven’t yet made a prediction about ‘Writing down PROMISING’, so, let’s do that now:
- ⚖ In the past month (a year from now), I’ll have done some kind of prediction practice I think is downstream of the ‘Write PROMISING when I have an insight’” (Raymond Arnold: 30%)

Think conceptually and learn about the field

Argued a bunch with Eli Tyre and Oliver Habryka about whether various versions of the project made sense. Notable points of confusion/disagreement were:
- Exactly how worrisome are the warning-skulls from the psychometrics and educational literature?
- How can we test that any of this is real, and applies in real life?
- Do people have “traits” that aren’t really mutable (other than raw g) which determine whether they can do certain types of cognitive work?
Poke around a bit in the literature myself
Hire someone to do a literature review on transfer learning and metalearning

What’s Next?

I’m currently running at this project for another ~month. I’m hoping to end up with some kind of weeklong beta-test workshop at the end of it.

After that, I’ll take a break, evaluate whether this seems longterm promising, and figure out whether there is funding to do the scaled up version of this thing. (My ideal version of this involves hiring textbook authors from various fields, puzzle designers, expert tutors, etc).

A major crux will be “does this seem like something that people would actually pay enough money to pay the salaries of people developing the curriculum and implementing any coaching or workshops that follow?”

^
See also: eliminating the the feeling of idea scarcity.

What links here?