The Strategic Level

Link post

This is part 29 of 30 in the Hammertime Sequence. Click here for the intro.

I find myself dragging my feet on the last couple days of each Hammertime cycle. From this and several other data points, I think current my writing attention span is around a week, and drafts and outlines sitting for more than a week feel too stale to finish. Had I known this in advance, I would probably have structured Hammertime as six 5-day sprints.

Reinforcement Learning?

What happens when reinforcement learning isn’t enough?

You playing a game of Go against sensei. On move twenty-four, sensei invades your three-space extension with devastating precision, cutting a group you thought was safe into two scattered dragons. The left dragon tries to run away, but sensei cuts its escape route off with a delicate leaning attack on your corner enclosure. It dies with abandon.

The right dragon, now facing the massive wall sensei built up by attacking the left group, tries frantically to make life locally. Its second eye is poked out unceremoniously by a well-placed tesuji. Because of your struggle, sensei has fifty points of territory and thickness radiating across the entire board. You resign.

What is a novice supposed to learn from a game like this? If your teacher leaves you to your own devices to review the game, you might easily conclude any of the following, if not a dozen other things:

  1. Don’t make three space extensions.

  2. Never try to run away.

  3. Do not respond to leaning moves.

  4. Sacrifice early.

  5. Study life and death.

Let’s say you learn lesson 1, don’t make three space extensions. The next week’s teaching game, you dutifully plod out two spaces from each approach. Sensei’s stones are balanced and efficient while yours are over-concentrated and unimaginative. You lose handily by points.

What happens now? Do you return to three-space extensions, frustrated with two-space ones?

Over-correction and Learning Stopsigns

The Strategic Level is a CFAR flash class about learning strategically: updating in such a way that will actually prevent the same failure modes in the future. The kind of learning above is definitely not strategic.

As I see it, there’s two common and overlapping kinds of failure modes in learning, where the lessons learned can be worse than nothing.

The first kind is over-correction:

Had an argument: “I should be more understanding.”

Had a panic attack: “I should just care less about everything.”

Was a White Knight at Dragon Army: “I should just never trust human beings.”

Lost a Go game: “I should never make three-space jumps.”

Such overly general lessons can be cures worse than the disease. As your simple strategies progressively fail, you need to come up with and try more and more complicated strategies. You can’t just continually bounce between two extremes, refusing to stare the complexity of reality in the face.

The second type of failure is similarly unproductive:

I should have just read out that dan-level life and death problem!

I should have just studied chapter 3 instead of chapter 2!

I should have just tried to use the polynomial method on that problem!

I call these thoughts learning stopsigns. A common type of learning stopsign is of the form “should have done so and so,” where so and so is some arbitrary, brilliant, unreasonable choice you would never have made in advance. Just as semantic stopsigns masquerade as answers, learning stopsigns masquerade as lessons learned while not actually providing practical utility for the future.

The learning stopsign simply says: turn back, nothing to see here, painful thoughts past this point. It’s usually accompanied by a nonchalant shrug.

Strategic Learning

What does it mean to learn strategically?

Whenever you fail, try to answer the question, “What way of thinking would I have had to employ to have caught this problem ahead of time?” Every lesson learned is a chance to tune your cognitive strategies to prevent as wide a class of similar problems as possible in the future.

At very least, learn to recognize unproductive over-correction and to drive past learning stopsigns. When you encounter a failure and make a snap judgment about what went wrong, ask yourself: is it any less likely I’ll fail in the same way again?

Exercise: Set a Yoda Timer and meditate on your most recent mistakes.

Daily Challenge

Share a story of a cure that was worse than the disease.