Exercise: Planmaking, Surprise Anticipation, and “Baba is You”

This is an exercise about Planmaking and Surprise-Anticipation. It takes about 2-3 hours. It’s a small, simplified exercise, but I think it’s a useful building block.

Humans often solve complex problems via iteration and empiricism. Usually, trying to figure everything out from first principles without experimenting is a bad idea. You can spend loads of time thinking, and then you go outside and interact with reality for 5 minutes and realize all that thinking was pointed in the wrong direction.

But some important problems have poor feedback loops, such that iteration/​empiricism don’t work very well. Experimentation might take a really long time, the results might be noisy, or you might just really need to get something right on the first try.

Often, when making a plan in a confusing domain, it’s enough to just ask yourself “how do I expect this plan to turn out?” to get you to notice ways the plan is likely to fail. Then you can fix those things. This is often faster than doing the entire plan, and watching it fail, and then doing it all over again.

Side note: a particular worry I have is that a lot of people entering the AI Alignment space don't feel like it's tractable to tackle more theoretical research directions, and end up gravitating to interpretability or evals because they at least have a feedback loop. One thing I think this exercise is "for" is laying some building blocks for "how to think about a situation where your feedbackloop is terrible, and eke as many bits out of it to help you focus your strategy.

I don't know whether this will successfully transfer to the domains I care about, but that's one thing exercise is aiming at.

This exercise uses the Baba is You videogame to teach a combination of rationality skills (which I suspect weave together into something greater than the sum of their parts):

  • Planmaking

  • Calibration

  • Inner Sim /​ Internal Surprise-o-meter

  • Patience

Those skills weave together into something similar to Murphyjitsu, but with a somewhat different flavor. The exercise is intended to build upon Exercise: Meta-strategy [TODO], and is a stepping stone building towards Generating 10x Plans.

I’ve tested this on ~6 people, including myself. So, this is still experimental, but I think good enough about it to ship it publicly for now. Let me know if you try it. You can post your results in the comments (please spoiler-block them).

Format:

  1. You’ll be given a puzzle video game level, which you haven’t played before.

  2. Instead of fiddling around, playing with the game the way you might normally do… you will just look at the screen, and make a complete plan for solving a given level, before you begin to move your character around.

  3. Write down that plan as a series of steps.

  4. Before you execute your plan, for each step in the plan, consider all the ways that you might be surprised when you execute that step.

  5. Loop through all of your “possible surprises”, and consider if any of them actually seem more likely than your mainline plan. Consider updating your plan. If there is a step that might go multiple ways, try making multiple guesses and plans.

  6. Are you confident in your plan? If so, execute it.

  7. Did the plan go the way you expected? Spend 10 minutes reflecting on what you learned, and what you could have done differently.

I recommend doing the exercise using this google doc worksheet.

(I think this exercise would work well as a meetup where one meetup-organizer has read the post thoroughly, has already done the exercise once, and can help other people who are confused or stuck)

Step 0: Read this blogpost

This is a fairly involved exercise. I’ve broken in down into steps so you only have to think about one thing at a time, but it’s useful to first read through the whole thing so you can see how all the pieces fit together.

Step 1: Download the Game, Pick a Level

We’re practicing planmaking in a game called Baba is You. Baba is You is a really great puzzle game, but we’re adding some additional wrinkles.

My favorite version of this exercise is one where you’ve never played Baba is You before, and part of the task is figuring out the core gameplay mechanics without even interacting with the game. (If you haven’t heard anything about the game before, I highly recommend not looking anything up first)

If you’re a rationalist nerd on LessWrong, you’ve probably heard about or even played the game. That’s fine. Unless you’ve literally beaten the entire game, this exercise works if you play a level you haven’t played before. (I recommend picking a level that introduces at least one new mechanic you haven’t seen before)

So, go download the game on Steam, or whatever device you prefer.

If you’ve never played the game before, I recommend starting with a particular level I made, via:

  1. Click “Play Levels” from the menu

  2. Click “Get new levels”

  3. Click “Use level code”

  4. Enter: “6KQL-TPUB”

Alternately you can play the normal game from the beginning. The first couple levels might be fairly easy. I recommend following all the exercise steps for the first 2 levels, but it’s okay to do them a bit more quickly, and save your “try very hard to get it right the first time” for a later level.

If you’re just starting the game, you have to do the first levels 0 and 1, but then I recommend skipping to levels 3, 4 and 7. (Levels 2, 5 and 6 each have elements that make them somewhat worse exercises here IMO)

Remember to pull up the google doc worksheet.

Step 2: Observation, Orientation and Livelogging

Once you’ve gotten to a level that seems like a good fit, stare at the level for a bit and soak in the details. It will look something like this:

A screenshot from Baba is You, showing simplistic graphics in the form of colorful block puzzles.

I recommend writing down the all the details that seem relevant.

More generally: I recommend livelogging. Jot down notes about your thought process as things occur to you.

After soaking in the level, start thinking through how to solve it.

If you’re new to the game, you might ask “How do I solve it? What are the rules? I don’t even know what it means to beat a level of this game.” Part of the point of this exercise is you do have information about that, even without having played the game. You’ve played other games before. If you haven’t, you’ve probably interacted with the world and you can make an informed guess about what you need to achieve.

A level might introduce multiple new mechanics you haven’t encountered before. For each new element, I suggest coming up with some guesses as to how that element will behave, or what happens when you try to interact with it.

Step 3: Write out your plan

Okay, now start writing down your best guess plan for solving the level.

This can include things like “I think if I do X, it’ll most likely cause Y to happen, but might cause Z to happen instead. If Y happens, I’ll do Plan A, if Z happens, I’ll do Plan B.”

(You don’t need to go crazy branching out on every possible option for this exercise, just pick the most likely branch, and maybe 1-2 backup plans)

Step 3A: If you get stuck, brainstorm strategies

If you feel like you have no idea what to do, stop and go meta. Try spending 10 minutes brainstorming strategies that might help.

Examples of strategies might include:

  • Try to break the problem down into subgoals

  • Notice that you’re tired, and get a snack

  • List as many dumb ideas as you can

I recommend setting a literal 10 minute timer, and trying to come up with at least 10 strategies.

Step 4: Predict Surprises

For each step in your plan, write down the how likely the plan is to go the way you expect. (i.e. 10% likely, 50% likely, 90% likely, etc).

For step, ask “Does this seem like an area I expect to get surprised? What other things might happen instead of my main prediction?”

You might notice that you actually think there’s a second outcome that feels more likely than your original plan. If so, maybe update your plan + predictions.

When you are done, right down your overall probability that your plan will work.

Step 5: Execute the Plan

Once you have a plan written down, and you’ve thought about how likely you are to be surprised… execute the plan!

...

What happened? Did you get it right on the first try? If so, yay! If not, think more, and see if you can come up with a new plan.

Did you get any “surprise surprises” (as opposed to “surprises in a place you kinda expected to get surprised?”)

Step 5b: Try earnestly 3 times, then, idk screw around

If your plan didn’t work, go back to the drawing board and try again. You’ve lost some imaginary points from Raemon, but, you can still try again to make a followup plan. If your assumptions were wrong, re-examine them and think about what else might work.

Each time you try/​fail, I recommend setting another 10 minute timer for “meta-strategy brainstorming.”

If you’ve tried this earnestly 3 times, after the 3rd time, I think it’s fine to switch to just trying to solve the level however you want (i.e. moving your character around the screen, experimenting).

Step 6: Debrief /​ Meta Reflections

Whether you got the answer right or wrong, now you stop to ask “how did I do? Could I have done better?”

Set a 10 minute timer, and brainstorm potential takeaways. Some possible prompts:

  • What were some useful thoughts that you thought?

  • What were the key pivot points in your thinking?

  • How could you have gotten to the right concept faster?

  • Summarize the key concept of the solution.

    • Is there an abstract generalization of that concept?

    • How does that generalization apply other problems?

  • What thinking approach brought me to the right answer?

    • Does that approach generalize?

  • What were some useful thoughts that I thought?

Followup

I think it’s worth doing this exercise a couple times until you’ve gotten the hang of it. You can do it on different puzzle games.

The next step after that is “try making some actual real life plans for goals that feel somewhat hard/​confusing”, and reflect on which parts (if any) seem to transfer. I’m working on some explicit exercises for this, but so far it seems to depend a lot on an individual’s goals. So far, this process seems to take a few days rather than a few hours.