Framing Practicum: Dynamic Programming

This is a framing practicum post. We’ll talk about what dynamic programming (DP) is, how to recognize DP in the wild, and what questions to ask when you find it. Then, we’ll have a challenge to apply the idea.

Today’s challenge: come up with 3 examples of DP which do not resemble any you’ve seen before. They don’t need to be good, they don’t need to be useful, they just need to be novel (to you).

Expected time: ~15-30 minutes at most, including the Bonus Exercise.

What is DP?

Suppose I am about to drive from Miami to Boston and I need to get to Boston as fast as possible. As a first step, I check the highway map and create a list of possible routes for this trip (let’s assume “good” old times with no Google maps). For instance, looking at the imaginary routes in the figure below, the route at the top says I should take the “Miami → A → C → E → G → Boston” route and the total trip distance would be 200+170+220+400+430=1420 miles. Which specific route should I take to minimize the total distance, thus total travel time? I can, of course, calculate total travel distance for each possible route and pick the one with least-distance. But it could easily get very time consuming if there exist hundreds of thousands of possible routes to evaluate.

One alternative approach to identify a route with minimum distance is to use a backward method. Suppose I drive backward through the route map from right to left. First, I will start at the destination, Boston. If I am in city G or H, I have no further decision to make since there is only one route that leads me to Boston from either city. The number in green summarizes total distance with one last trip, or one stage, to go.

My first decision (from right to left) occurs with two trips, or stages to go. If, for example, I am in city F, I can either drive 150 miles to city G and another 430 miles to Boston—total 580 miles, or drive 200 miles to city H and another 400 miles to Boston—total 600 miles. Therefore, the shortest possible distance, or optimal route (in green), from city F is 580 miles (F → G → Boston). The alternative route from F (F → H → Boston) is suboptimal with 600 miles to go (in red).

Let me back up one more city, or stage, and compute the least-distance from city C and D to Boston. Figure below summarizes these calculations.

Once I have computed the optimal route from city C and D onward to Boston, I can again move back one city and determine the optimal route from city A and B onward.

I continue this process and will end up with an optimal route with least-distance of 1080 miles to the problem (highlighted in bold arrows):

This is DP: An agent faces a multistage optimization problem (travelling from Miami to Boston by travelling through multiple cities). At each stage (e.g., I have one more trip to go to Boston), the agent might be in a different state (e.g., I am currently in city A or city B). According to the current state (e.g., I am in city A), the agent takes a specific action (e.g., I will drive to city C) and as a result of that action, the system transitions to a different state in the next stage (e.g., now I am in city C). We solve the multistage optimization problem by working backwards, at each step computing the best reward we can get from that stage onward by considering each of the possible “next/​child” stages.

What To Look For?

DP should come to mind whenever an agent faces a problem with multi-stages nature and the agent takes a series of actions. Another defining feature of DP is that the original multi-stage complex problem can be dismantled into a sequence of simpler and smaller problems. The action the agent takes in a particular stage depends on the current state and the reward the agent would receive by taking that specific action. In addition, the action the agent takes impacts the state of the system, causing the system to transition to a new state.

Useful Questions to Ask?

In the shortest driving time example, the ultimate goal is to minimize total driving time such that I can arrive at Boston as fast as possible. At any given time in my trip, I might be in a different state—the city I am in at that time. For instance, on the second day of the trip, I might be in city C or city D. Given my state, I have a simpler problem to solve: What is the shortest travel time from city C or city D to Boston?

The system may start in different states. The agent takes a series of actions to optimize an objective across multiple stages. Each stage also has multiple states. The specific action an agent can take is a function of the state the agent currently in.

In general, whenever we see problems where DP is applicable, we should ask:

  • What is the objective?

  • What are the stages and states of the system?

  • What are the actions the agent can take at any given state?

  • How does a specific action change the state of the system?

  • What is the value function? How is an agent rewarded for taking a particular action at the current stage?

The Challenge

Come up with 3 examples of DP which do not resemble any you’ve seen before. They don’t need to be good, they don’t need to be useful, they just need to be novel (to you).

Any answer must include at least 3 to count, and they must be novel to you. That’s the challenge. We’re here to challenge ourselves, not just review examples we already know.

However, they don’t have to be very good answers or even correct answers. Posting wrong things on the internet is scary, but a very fast way to learn, and I will enforce a high bar for kindness in response-comments. I will personally default to upvoting every complete answer, even if parts of it are wrong, and I encourage others to do the same.

Post your answers inside of spoiler tags. (How do I do that?)

Celebrate others’ answers. This is really important, especially for tougher questions. Sharing exercises in public is a scary experience. I don’t want people to leave this having back-chained the experience “If I go outside my comfort zone, people will look down on me”. So be generous with those upvotes. I certainly will be.

If you comment on someone else’s answers, focus on making exciting, novel ideas work — instead of tearing apart worse ideas. Yes, And is encouraged.

I will remove comments which I deem insufficiently kind, even if I believe they are valuable comments. I want people to feel encouraged to try and fail here, and that means enforcing nicer norms than usual.

If you get stuck, look for:

  • Problems with multistage nature.

  • Problems that can be dismantled into a sequence of simpler problems.

Bonus Exercise: for each of your three examples from the challenge, explain:

  • What are the simpler and smaller problems in the DP example?

  • What are the states and how taking a specific action alters the state?

  • what is the reward for taking a specific action on a given state?

This bonus exercise is great blog-post fodder!

Motivation

Using a framing tool is sort of like using a trigger-action pattern: the hard part is to notice a pattern, a place where a particular tool can apply (the “trigger”). Once we notice the pattern, it suggests certain questions or approximations (the “action”). This challenge is meant to train the trigger-step: we look for novel examples to ingrain the abstract trigger pattern (separate from examples/​contexts we already know).

The Bonus Exercise is meant to train the action-step: apply whatever questions/​approximations the frame suggests, in order to build the reflex of applying them when we notice DP.

Hopefully, this will make it easier to notice when an DP frame can be applied to a new problem you don’t understand in the wild, and to actually use it.