Some notes on solving hard problems

Caveats/​Notes:

  • This is a cleaned-up version of some personal notes that I’ve had sitting in Google Docs for a while. There’s a lot of hand-waving, probably some over-generalising, and many points could do with some/​more examples. Please take each suggestion with a large grain of salt, and feel free to add your thoughts via the comments.

  • I’m not sure how far these thoughts will transfer outside of my particular problem domain, and problem-solving style. I tend to solve problems by building up visual intuitions/​models/​algebras, rather than leaning heavily on symbols/​equations.

  • There are some overlaps between various suggestions below—they shouldn’t be read as completely orthogonal points. Some ideas are special cases of other ideas, or are basically the same idea from a different perspective.[1]

  • By “hard” problems, I mean hard for me—as in, something that I wouldn’t expect to be able to sit down and solve in one session, say, and which might end up taking days, weeks or longer. That said, some of the stuff here is probably relevant for problem solving more generally.

Develop multiple perspectives on the problem

You should work hard to “see” your problem and potential solutions from multiple perspectives. By “perspective” I mean a high-level mental model that you use to frame your thinking about the problem. A perspective might center around an analogy, or a particular data structure, or a line of reasoning rooted in some low-level principles.

Once you’ve got multiple perspectives, you can play the perspectives against one another to get a more fundamental understanding of the problem. One perspective will give you insights which unblock your progress on another perspective.

Each perspective is just a mental tool to help you think about the problem—it is specialised for understanding certain types of behavior/​constraints that are difficult to understand with the other perspectives.

Good perspectives make important aspects and constraints of your problem “obvious”. They make it easy to quickly reason about your problem, and neatly encapsulate constraints so that it requires little effort to keep them all in your head. If you find yourself having to “strain” your brain to pull implications out of a mental model, then it may be that you’re using the wrong one for that particular task[2].

A quote from Claude Shannon that I found on this topic:

[...] try to restate it in just as many different forms as you can. Change the words. Change the viewpoint. Look at it from every possible angle. After you’ve done that, you can try to look at it from several angles at the same time and perhaps you can get an insight into the real basic issues of the problem, so that you can correlate the important factors and come out with the solution. It’s difficult really to do this, but it is important that you do. If you don’t, it is very easy to get into ruts of mental thinking. You start with a problem here and you go around a circle here and if you could only get over to this point, perhaps you would see your way clear; but you can’t break loose from certain mental blocks which are holding you in certain ways of looking at a problem. That is the reason why very frequently someone who is quite green to a problem will sometimes come in and look at it and find the solution like that, while you have been laboring for months over it. You’ve got set into some ruts here of mental thinking and someone else comes in and sees it from a fresh viewpoint.

Start the jigsaw puzzle with the corner pieces

The corner pieces of a jigsaw puzzle are the easiest to place. Lock in the “easy certainties” first, and then build more “solid ground” outwards from them—work in the heavily-constrained areas. Speculating on the super “fuzzy” areas (e.g. the center of the jigsaw puzzle) is inefficient—you’ll end up testing many more approaches/​solutions than you actually need to. As you get closer to the solution, you have more constraints, so your search space is much smaller—it’s very easy to place the last few pieces.[3]

Said another way: Find a small set of things that you’re very sure about, and “anchor” on these things, and then ask “What do they imply?” There can be a tendency to get lost in a sea of speculative ideas—just wandering along randomly, following things which seem like they might be fruitful. If you can instead find a couple of things that you (almost) “know” to be true, and find the things that they definitely imply, then you can keep anchoring on the successive implications and climbing towards the solution in a more constrained and methodical manner.[4]

Don’t waste a dead end

If you feel like you’re on the right track at some deep level, but you reach a dead end that has you frustrated and completely at a loss, keep pushing[5]. It’s exciting to be at a dead end of an approach that otherwise seems great, because it might mean that you’re close to a big shift in one of your fundamental assumptions. It’s the moment when you’re most likely and most prepared to start “questioning everything”. You’ve explored all the inside-the-box solutions, now you’re primed to try some outside-the-box ones—don’t skip that opportunity!

Even if you end up having to move on, you should take note of what absurd/​implausible adjustments to your assumptions that you’d have to make to “get past” this road block. Turn those over in your mind (even if they don’t really make sense), because next time you come across a road block you’ll do the same thing, and that’s when you might be like “Oh, this sort of seems related to that absurd workaround that I needed for that other approach.” And so you’ve got a strong signal to think about that more.

I’m guessing that the speed of light being the same in all reference frames was one of these “absurd” workarounds/​assumptions that turned out to be correct. Try “believing” what your approach is saying it needs. As a way to escape ingrained assumptions, you almost want to pretend you’ve got a “hack” that can defy logic to help you push past this dead end. What would that “hack” be? And once you’ve applied the “hack”, how could your model “transform around it” such that it actually makes sense as an approach?

Feed your visual cortex

Draw diagrams that elegantly represent the relationships that you’re trying to internalise/​understand. Use different shapes, colors, shades, textures, thickness, angles, area, etc. to represent the data so you don’t need textual “labels” (which aren’t easy to manipulate in your mind). I don’t know enough neuroscience to make broad claims here, but if you’re anything like me, the visual part of your brain plays a large part in reasoning about the relationships in your problem, so try to help it out as much as possible. Your goal is to create “interactive animations” in your mind, which will help you mentally play with the system and understand its behavior.

If you’re not using diagrams to help you think about some new system, then you’re probably wasting your working memory. Offload as much of the problem as you can onto your tools so that you can use your brain to hold onto several abstract conditions-to-be-satisfied at once, and to do very abstract manipulation that is perhaps hard to visually represent. Diagrams can help you easily fit a problem in your brain that would otherwise require a lot of “brain strain”.

Each “perspective” that you develop will have its own set of interactive animations—it’s own “visual algebra” with components that can be intuitively manipulated to derive otherwise-non-obvious insights, behaviors, and constraints of the system.

Abstract it, and then make it tangibly concrete

Try to pull out the underlying mechanics of a problem, and then try to turn it into a “real-world” riddle or thought-experiment. The “real-world tangibleness” of the riddle that you create is the important part, because it allows you to lean on intuitions that have been sharpened during your everyday life, and that you wouldn’t otherwise be able to use during your problem solving process.[6] You’re essentially “porting” your problem to another “mental algebra”—one that you’re more familiar with since you use it to navigate the world every day. This won’t always make sense as a strategy, but it’s one tool in your toolbox that can sometimes be useful to find new perspectives on the problem.

Note: This isn’t quite the same as the colloquial “reasoning by analogy” since you’re not trying to pull out “fuzzy” insights from an abstractly-related, but concrete real-world system[7]. You’re instead constructing a hypothetical system from scratch that happens to use real-world objects/​dynamics simply because it can sometimes help you frame the problem more intuitively. That said, you do still need to be careful that you’re not fooling yourself, because you may accidentally lean on an irrelevant feature of some real world object/​dynamic in your reasoning.

Name your concepts

You’ll inevitably end up inventing new concepts, and you’ll be able to think about them more easily if you give them names. Problem solving is mostly about creating good abstractions such that you’re able to compress the problem space enough to fit it in your head. By giving names to the things/​aspects/​concepts/​ideas that regularly show up in your thinking, you’ll be able to use them in your thought processes more easily—like encapsulating some regularly-used code into a function or class.

Make sure you’re very clear on what the name refers to. It’s surprisingly easy to trick yourself into thinking that you understand something when you’ve got a name for it. Unpack your abstractions if you feel like you’re getting muddled.

A related point: Don’t be afraid to invent your own symbols and notations where it makes things significantly neater—no qualifications are needed for this.

Take tiny steps

It can sometimes be tempting to make a “leap” towards some interesting idea that you spot in the distance—especially if you’re currently a little “stuck”. These sorts of leaps sometimes make sense, but you should really have a good reason to abandon your current approach—not something like “Hmm, maybe this other thing will work.” It’s easy to fall into an inefficient “guess-and-check” mode of problem solving. If you’re stuck, you should be asking “Fundamentally, what is wrong with this solution/​approach?” and base your next step on the answer.

By taking small, well-thought-out steps, you’re forcing yourself to build a map of the solution space. Going slowly and carefully is a skill that you need to hone—it’s not necessarily forced on you. It’s often easy to fool yourself into thinking that you’re on the right track—this is not so much the case in many other activities, like programming, where the compiler or runtime will pretty quickly tell you that something is wrong.[8]

Improvements on your existing idea (e.g. to allow it to handle a more general class of problems) should be incremental. Where possible, only add one small piece at a time, then think about the implications, and try to test whether it makes sense. Adding several components at once is a bit like making several changes to different parts of a code base in between tests. If you’ve introduced a bug, then you’re going to have a harder time working out what’s going wrong than if you made the changes as serially as possible, testing in between each minimal set of changes.

There are elements of patience and ingrained methodology involved here, but I think being able to move very slowly also depends on having an understanding of your problem at a very low level. If you only have high-level models/​understandings, then you’ll naturally move more quickly, and you also won’t be able to move down a level in the abstraction hierarchy to find potential flaws in those high-level models that might be making them bad “tools” for helping you move toward the solution.

Surround yourself with context

For example, if I’m using pen and paper, then rather than turning the page so that it hides the last page, I use loose paper so I can sprawl out the last several pages (with important points highlighted). This helps me to hold more assumptions/​conditions/​requirements in my head at once, which can be useful when I’m trying to narrow down on a solution that satisfies several different conditions.

I’ve found it very helpful to have a “hyper-distilled” list of all the very important “milestones” or “sub-solutions” of the journey. I can glance at that and it lifts my head out of the “trench” so I can see the bigger picture and remember where I am, and where I’m going. It’s also useful to have lists of questions, important considerations, special cases (for quickly testing ideas), and mental “tools” (including diagrams and useful “perspectives”) that have been helpful in thinking about the problem.

That said, I think it can be dangerous to lean too heavily on context because you can get trapped in a particular mode of thinking—e.g. dismissing a particular idea because you tried it already, but not taking into account the fact that it may actually work now that several things about your model have since changed. So when something significant changes in your model, be wary of leaning heavily on previous work as a “shortcut” to eliminating/​choosing particular ideas. Old, ingrained ideas/​assumptions/​etc. can sometimes unhelpfully constrain your thinking.

Take a long break to help surface mistaken assumptions

Innocent-looking but mistaken assumptions can slowly embed themselves into your reasoning to the point of being completely unquestioned. The longer a mistaken assumption remains unquestioned, the harder it is to see that it’s the thing that’s blocking progress. One antidote here is to leave the problem alone for a while (hours, days, weeks, or even many months—depending on how long you’ve been stuck for). When you come back the assumptions won’t have as much of a hold on you. With a fresh mind you might find yourself face-palming at some obvious flaw in your previous approach.

Use special cases

It can be helpful to think through your problem using “special” or “extreme” cases. A cartoon example: If your “rabbit behavior theory” should apply to any group of grayscale-furred rabbits, then try it first on the special case of one white rabbit, for example. If it should also work for rabbits with any number of spots (assuming this is a relevant parameter), try it first on one white rabbit with zero spots. You might also want to think about what your model says should happen for rabbits as the number of spots approaches an extreme like zero or infinity.

Special cases help you to fit complicated problems in your head, and to quickly find counter-examples that’ll save you from wasting time on a bad idea. If you find yourself saying “Okay, I need to think through an example to see if this works,” you should immediately jump to a minimal special case. You can quickly reason about them without needing to “pull out your calculator” (in some sense) and they often make interesting properties very clear/​obvious. If it doesn’t work in the special case, it’s probably not a valid solution[9] and so you can quickly move along with your reasoning. If it does work, then you can try it on a few more special cases and perhaps convince yourself that this approach is worth exploring further.

Infinity and zero are common special cases, but special cases aren’t necessarily “extremes”. A special case of a particular parameter (or set of parameters) might be/​involve minimums and maximums, but it could also involve things like equality, opposites, odd, even, one, negative one, half, symmetric, all, none, 0 degrees, 90 degrees, independence/​orthogonality, similarity, 1-dimensional, infinite dimensional, rare, common, mean, median, mode, messy, clean, perfect, imperfect, square, square root, etc.

In general you’re trying to pick values/​configurations which simplify the system and perhaps “expose” a particular aspect so it’s easier to reason about. A “real world” analogy for this type of thing is when, in a discussion or debate, you try to show that a line of reasoning fails (i.e. contradicts some very strongly held belief) in the general case by pushing it toward some extreme case—it becomes much more obvious that it doesn’t work at extreme/​special cases.

It may be useful to develop a habit of trying to “outsmart” your model with minimal special cases as you develop it. That is, try to find simple examples where it doesn’t work properly.

A lot of my time (when working on a problem) is spent trying to prove that I’m wrong using minimal/​special cases. I take a step forward, and then attack the model from many different angles, and if it survives, then I take another step forward, and test it again, and so on.

Another common pattern for me is to first develop a solution that works in a very simple/​special case, and then add another special case and see what needs to change to be able to handle both, and then continue adding more special cases one-by-one, generalising the solution with each step.

Beware of optimising for edge-case “singularities”

Special cases are useful thinking tools, but sometimes you’ll be thinking about a particular case without realising that it’s an edge case that your model doesn’t really need to handle. Your model might only need to handle points “in the limit” as you move toward that special case—the case itself might, for example, be an infinitesimally rare situation in the “real world”.

If possible, you’ll almost always want to trade away your solution’s ability to handle rare cases for a solution that’s more efficient/​effective in common cases. That’s obvious, but it can sometimes be hard to recognise that a particular special case is rare or unimportant.

Use assumed continuity between special cases

Once you understand the correct behaviour of your model at the special cases, you might be able to rely on an assumption of underlying “continuity” to help you eliminate otherwise plausible-seeming solutions. Special cases are, in a sense, “points on a curve/​space”, so as you tweak the parameters to move away from those special points, you can use weird behavior (e.g. an unexpected “step” change) as an indication that something might be wrong.

Chisel down problems before attempting to solve them

This is kind of similar to the point about “thinking with special cases” in that special cases essentially allow you to temporarily hide “bells and whistles” (as Shannon calls them in the quote below) that make up the full/​general/​optimised version of your solution. All those bells and whistles make your model harder to reason about.

It’s kind of like debugging with a “reduced test case”—constrain or chop away all the irrelevant stuff before trying to debug your problem.

This quote from Robert Gallager about Claude Shannon summarises it well:

I had what I thought was a really neat research idea, for a much better communication system than what other people were building, with all sorts of bells and whistles. I went in to talk to [Shannon] about it and I explained the problems I was having trying to analyze it. And he looked at it, sort of puzzled, and said, “Well, do you really need this assumption?” And I said, well, I suppose we could look at the problem without that assumption. And we went on for a while. And then he said, again, “Do you need this other assumption?” And I saw immediately that that would simplify the problem, although it started looking a little impractical and a little like a toy problem. And he kept doing this, about five or six times. I don’t think he saw immediately that that’s how the problem should be solved; I think he was just groping his way along, except that he just had this instinct of which parts of the problem were fundamental and which were just details. At a certain point, I was getting upset, because I saw this neat research problem of mine had become almost trivial. But at a certain point, with all these pieces stripped out, we both saw how to solve it. And then we gradually put all these little assumptions back in and then, suddenly, we saw the solution to the whole problem. And that was just the way he worked. He would find the simplest example of something and then he would somehow sort out why that worked and why that was the right way of looking at it.

And here is a quote from Shannon himself talking about this process:

Attempt to eliminate everything from the problem except the essentials; that is, cut it down to size. Almost every problem that you come across is befuddled with all kinds of extraneous data of one sort or another; and if you can bring this problem down into the main issues, you can see more clearly what you’re trying to do and perhaps find a solution. Now, in so doing, you may have stripped away the problem that you’re after. You may have simplified it to a point that it doesn’t even resemble the problem that you started with; but very often if you can solve this simple problem, you can add refinements to the solution of this until you get back to the solution of the one you started with.

Also, it goes without saying that if you can completely “decouple” various parts of your problem and have them interact only through neat interfaces, then you definitely should. But as Shannon alludes to above, even tightly coupled components can often be separated temporarily when trying to solve some problem at the heart of one of your components (again, I think a programming/​debugging analogy is apt).

Step back regularly

Sometimes when you’re working on a sub-problem for a long time, it’s possible to let yourself get “carried away” relative to the overall problem/​solution. You should step back every now and then so you can check your work against the set of high-confidence assumptions/​constraints that you’ve developed along your journey so far.

As a bad analogy, if you’re digging a trench, you’ll want to make sure you’re lifting your head out of the trench every now and then to ensure you’re still digging in roughly the right direction.

Start naive

If you’re not sure where/​how to start your journey towards solving a problem, I think a good general strategy is just to start with a “naive” solution—something that’s very simple, and kinda works, but obviously isn’t optimal. Allow yourself to start with a silly/​brute-force/​obviously-wrong solution that comes to mind. Then ask “What is it that makes this solution bad? Where and why does it fail?” And you can iterate towards better solutions from there. It almost doesn’t matter where you start.

I think this works (for me at least) because it creates an easy-to-understand pathway that builds up from something simple that I have a solid understanding of. I can see that my solution is just <simple idea> plus <feature X for reason Y> plus <trade-off Z for reason W>, and so on. Usually I’ll end up discarding that kind of narrative as better perspectives emerge, but it gets me started.

On arriving back where you started

In the search for a solution, if you end up back at a potential solution that you rejected a while ago, that’s generally a sign that you’re getting close to “mapping” the whole space of viable-looking solutions, and so you’re probably getting close to understanding the whole area at a higher level. It’s probably not a reason to be discouraged, even though it may give you the feeling that you’re “going in circles”.

What it generally means is that you now get to do a “second pass” over the branches of the search space, except this time with some new insights and perspectives under your belt. It’s unlikely that you’ve arrived back at an old idea with no new insights. You’re in a good position to spot flaws in your old ideas, and question old assumptions.

But note that if you’re “guessing and checking”—i.e. your exploration isn’t in small, well-thought-out steps, then you’ll probably end up going around in circles, and it won’t be productive. See the section called “take tiny steps” for more discussion of that failure mode.

Develop strong, foundational mental models

You’ll often need to go down a few levels of abstraction to think about some property of your current approach and/​or solve a problem with it. If you haven’t spent the time to develop mental tools for thinking about those lower levels, then it won’t be as easy to “debug” flaws in your higher-level models.

Having bad low-level models results in a lot of guess-and-check, because if something doesn’t work, then you can’t go “down” and see why—so you just end up jumping to some other idea. It also makes it much harder to go “up” because you’re essentially trying to build a “tower” (of abstractions) on shaky/​leaky foundations.

Allow yourself to spend time pondering the “bare basics” and develop perspectives and mental tools which make them intuitive.

The big picture

(This section is closely related to the “check your map regularly” section and the “see your solution as just one point in solution space” section.)

Lower-level decisions that you make along your journey should always be checked against a “bigger picture” understanding. This can sort of be framed as a “breadth-first” search process. The purpose of using a breadth-first search is to help you build foundational understandings of your search space by inducing abstractions from many plausible paths. These abstractions can then be used to make informed choices about which specific approach to dig deeper on. This is as opposed to just going down random rabbit holes and hoping that the optimal solution is at the end. With the former strategy you’re essentially compressing the search space first (merging branches), and then moving down the most plausible branch.

Another way in which a breadth-first search can make sense: When you’ve got multiple problems/​”weirdnesses” with your current model, don’t stubbornly push on just one of them for too long. It is often the case that several of your problems are closely related. In that case each problem might be a “porthole” that gives you a different vantage point on the same deeper, underlying issue. I’ve experienced this a few times where I’ve been working on a problem for a few days and the solution is becoming convoluted, and then I notice something that looks “familiar”—i.e. related to another problem that I’m having, so I switch over and look at that other problem, and very quickly come to the solution of the whole thing. The different “problem branches” were, in some sense, the same branch. If I’d only pushed on a single branch it would have been harder to converge on the “real” underlying problem.

Another related point/​perspective: In general, be careful about building things in a very “piecemeal” or “reactive” way—where you’re adding consecutive “patches” to your product/​solution without stepping back and considering much how you might be able to elegantly fit them all together using some broadly shared concepts/​abstractions. If you don’t “step back” enough, your solution may come to resemble (in some abstract sense) a legacy code base, or a late-stage geocentric model of planetary orbits.

Actively foster motivation

As out of place and mundane as it might sound, this is arguably the most important point. If you lose motivation, nothing in your problem-solving arsenal can make up for it. Maintaining motivation may require active work as the months pass by. It’s not just about the risk of giving up (that may actually make sense at some point). The larger risk is that solving the problem becomes less fun, which will slow you down.[10]

It helps to envision the “reward” at the end of your journey, and trace that reward back to the stepping stones that you’re currently hopping along.[11] Each little bit of progress should translate to a little bit of excitement/​reward. It’ll completely depend on the specific type of problem you’re trying to solve and why you’re trying to solve it, but in my case it’s often helpful to have an exciting use-case or project idea in mind (that requires a solution to the problem I’m working on), and to regularly think about that and trace the reward back to where I am now.

If your problem domain is such that you’re able to implement/​concretely-test iteratively better solutions over time (on your way to the general/​perfect solution), then it might be worth doing that if it’s not too costly—if only for the motivation that you get from seeing your creation “in the real world” (in whatever sense) versus just on paper. Be sure to track your progress on important metrics with each iteration, and make them easily visible/​legible to you (e.g. with a graph on your wall). Human brains are motivated surprisingly well by “make this number go up”-type challenges.

One thing that helps me stay on track is to remove other sources of dopamine/​reward that can drag my mind away from the problem. This likely won’t make sense for many people and problems, but I sometimes physically disconnect my WiFi router, or even go so far as to put the router power cord and my mobile SIM card in my letter box just to create a big-enough barrier between myself and distractions that seem well-justified, but absolutely aren’t. If I have unrelated or too-tangential questions that I want to Google, I’m forced to write them down in a list which I can check at the end of the day, instead of having the question interrupt my thinking and take me down a 30 minute rabbit hole of Wikipedia articles, or whatever. I’m not great at regulating myself if I don’t put barriers like this in place, but I think this varies a lot person-to-person.

One final point: As I mentioned in the “Take tiny steps” section, it’s surprisingly easy to fool yourself into thinking that you’re on the right track if you’re taking large, poorly-motivated steps around the “solution space”. This is really bad for motivation in the long term. If you keep getting tricked over and over, your brain will start to distrust the rewards that you’re giving yourself each time you (incorrectly) feel like you’ve made some progress, and that leads to burnout.[12]

Don’t prematurely optimise

I’m pretty sure this applies in areas outside of programming and computer science, but I’ll keep it concrete and leave the abstraction to you: If you’re developing a model that’s intended to be executed by a computer, then it’s tempting to add “optimisations” to components of the model before you’ve even got a working model. You might see some super obvious way to go from N to sqrt(N) steps, but which makes your model more conceptually complicated.

Avoid adding those sorts of optimisations until you’ve got a working model, because it may be that the “core” of your model is still not quite right, and reasoning towards a more correct “foundation” is much harder if you’re trying to “carry along” a bunch of optimisation-related adjustments/​corrections/​bandaids.[13]

Get things down on paper

When reasoning in your mind, as opposed to on paper, you’re more limited in terms of short-term memory. You can move faster, but if you’re anything like me, you have a tendency to think in “hand-wavy” abstractions (even without realising it). This is fine sometimes, but if you spend too long reasoning without “reining everything in” with paper or keyboard, then you can get into a muddle. Writing forces you to make your thoughts more “concrete” and helps extend your short term memory so you can work on certain aspects without trying to hold on to others at the same time. List out plausible approaches, draw a diagram of the thing you’re thinking about, write down your line of reasoning, list the conditions that need to be satisfied, write down questions that you need to get back to, etc.

This section could probably have been turned into a more general “make it more concrete to expose flaws” section. E.g. if you’re working on an algorithm, then actually attempting to code it up (or even just putting your mind in that “mode”) will probably help you generate problems that you hadn’t yet realised existed with your approach.

Work backwards a bit

It’s sometimes possible, in some sense, to see what properties the solution “leaf” (using the tree-search model of the solution space) should have, and then to work backwards at least a bit so that you’ve got a bit less work to do in your forward search.

Very abstractly, it’s like when you’re solving a maze (where you can see the whole maze from a bird’s-eye perspective). There’s sometimes some “free” progress to be made at the start and the end, because the paths don’t branch for a little bit. So it makes sense to draw a line from the end backwards until you reach the first branch point, at the very least.

This point is probably a special case of the “starting a jigsaw puzzle from the corner pieces” point in that it’s a way to get an “easy certainty” which can help you constrain your search.

Finding and inspecting inefficiencies

I’ve found it helpful to use special case examples to help quickly find inefficiencies in my approach that are clues to larger problems with the model. Look for ways in which your approach is “wasteful” in a special case. If you spot an inefficiency with one special case, the next step is to see if it applies to other special cases, and then to more general cases. An inefficiency that applies to just one special case probably isn’t an inefficiency—it’s likely just the cost of a solution that can handle more than just that one special case (generality isn’t free).

Note that a small inefficiency within a potential solution might not seem too bad, but it could be a clue that there’s something very fundamentally incorrect with your solution. Try to understand why that inefficiency exists—what are you getting in return for it? If you can’t see a net-positive trade-off that an observed inefficiency/​redundancy is making, then something might be wrong with some aspect of your approach.

Daydreaming about the problem/​solution

Initially, thinking about your problem will be hard work, but eventually you’ll have done enough compression/​abstraction, and turned it over in your mind enough from many different angles that you’ll probably find yourself “daydreaming” about your problem (or at least parts of it) without even realising it—e.g. while in the shower, falling asleep, exercising, etc. That’s a pretty good sign[14] - that’s where you want to get to. You’re probably close to solving it, or at least making some significant progress. If thinking about it requires a lot of mental effort, then you’re probably still in the phase where you’re pulling out abstractions and finding constraints, perspectives, etc. - i.e. setting up the “scaffolding” that you need to solve the problem or sub-problem.

See your solution as just one point in solution space

You should work to see the trade-offs your particular approach is making, and why it’s making them, and how you could change it to work better for a different set of requirements. Try to pull the abstract ideas out of sets of possible approaches so you can see ways in which they’re fundamentally the same. As mentioned elsewhere in this post, you essentially want to develop a “high-level map” of the solution space to guide you as you probe deeper on particular “concrete” approaches.

If you can’t “see” your approach as just one “choice” within a space of solutions, then you’re probably executing a “depth-first” search, which is great if you happen to hit upon the right approach on the first try, but that’s unlikely if it’s a hard problem.

Always attempt to generalise lessons

By analogy: If you’re writing a program and you come across an “interesting” bug in your code (i.e. not just a typo or something like that), you should always ask: “How did I make this mistake? What misunderstanding did I have?” and then you should wonder whether you made the same mistake elsewhere. You’re treating each bug as a more general lesson. This applies for mistakes in reasoning in general—not just for writing programs.

When you come across a new insight during your problem-solving process, you should be thinking “Why wasn’t this obvious to me? Are there some details of this insight that I can abstract away to get a more general insight?” If something is surprising when solving a problem, it might be a sign that you’re missing some deeper and therefore more broadly-applicable understandings.

One way to frame your search for a more general insight is to hunt for “analogies” for it in everything—look for other situations where a vaguely similar idea applies. You can then use the set of examples that you gather to induce the abstraction.

Another relevant Claude Shannon quote:

Can I apply the same principle in more general ways? Can I use this same clever idea represented here to solve a larger class of problems? Is there any place else that I can use this particular thing?

It probably goes without saying, but if you don’t generalise your lessons, then you’re just storing them inefficiently—you’ll end up with a bunch of “code duplication” in your brain, and new things that you learn in one domain won’t be able to “flow down” your abstraction hierarchy to give you lots of insights in different domains—every lesson will instead be isolated within its specific context.

Have multiple notebooks going at once

That way you can cleanly set one aspect of the problem aside and work on another. Switching between different aspects of the same problem in the same notebook is hard for me because things feel all tangled up, and I get (sensibly) worried that I’ll accidentally forget some crucial sub-problem or question that I left unresolved and thus that I’ll end up going off in the wrong direction. If you have an “aside” (some interesting thought/​question that just occurred to you), put it in on a new piece of paper. If you have to use the same notebook for everything, then make sure you highlight it, or maybe use a sticky note or something like that. That way you can fully release it from your brain and get back to what you were doing.

Related to this point: I find it useful to have both written notebooks and computer documents going at the same time. I tend to use two different “styles” of thinking depending on which one I’m using. With computer documents I tend to write out my thoughts as I think them[15] (I can type much faster than I can write on paper), whereas with my notebook I tend to write a sentence or two, perhaps with a diagram, and then after a while, write another short line summarizing what I’ve been thinking, and so on. If you do use computer documents, it helps to get fast at typing by learning to touch type (it’s a great investment in general). You might also like to try Workflowy (or similar alternatives) - the nesting/​folding feature makes it very useful for keeping thoughts organised.

Also, it’s a good idea to highlight important questions/​problems and realisations (2 different colors), so you can easily skim back over your notes and see the key points.

Have you missed any constraints?

This point overlaps with several others in this document. Finding sneaky constraints/​assumptions seems to basically be “the” skill of solving hard problems. If you have a solution that seems to model your problem, but it seems needlessly complex, or hard to reason about, then it could be that you’re just missing some key fact that is always true—some variable that is actually always constant, something that can never happen, something that always “contains” something else, two variables are actually independent under some condition, etc. And once you realise that, the solution collapses into something much more elegant and easy to reason about. Models are really just bundles of relationships/​constraints that apply to some variables, and so building models is fundamentally an exercise in finding those relationships.

Try explicitly laying out the constraints that you have, and ask what is implied if they’re true. Often it’s a pair of constraints which, together, imply some other (derived) constraint which unlocks a new perspective—a new way of thinking about the problem which makes important aspects simple and obvious.

A related question to ask yourself: Are there constraints/​relationships that you know about (they might be obvious ones), but which your solution doesn’t take advantage of?

Practice

It probably goes without saying, but being good at solving problems involves a lot of tacit knowledge that is developed through practice. Reading posts like this one should help to consolidate some vague patterns that you’ve been seeing during your problem solving sessions, but to get good at problem solving I think you need to spend a lot of time actually doing it.

Are you solving the right problem?

This is a meta point, but it’s probably the most dangerous failure mode in problem solving. You should be at least a little scared (but not paralysingly so) by the fact that some people devote their whole lives to solving problems that end up not being relevant in a decade due to a shifting paradigm, new technology, or something else. If you’re going to spend a bunch of time solving a problem, try to make sure it’s important, or is going to be important within some time frame that’s acceptable to you. Also, remember that the difficulty of a problem is not necessarily proportional to its importance.

The problems that you’ll tend to think are important are influenced by the “context” that you’re in, but that context may be on its “last legs”. Alan Kay put it like this, referring to his time at Xerox PARC:

Most problems are bogus because they come out of the current context. We’re trying to get beyond the current context.

Even if the context isn’t dying, it could still be constraining your thinking. If you’ve been working in a particular field for a few years, the problems that you see around you will be the problems of that field. Ditto for the technology that you’ve been using, and the social environment that you spend time in, and so on. So if the reason you’re trying to solve problems is to have an impact on the world, then you should care about your context because by choosing your environment, you are, to some extent, constraining the problems that are available to you—constrain wisely!

Other stuff

These are some short points on ideas that I haven’t got good explanations for yet, or that are more specific to solving certain types of problems, or which don’t deserve a whole section of their own, or which I’ve just added here because I’m too lazy to organize it into the proper section. If it makes sense for your particular problem:

  • Keep your mind open for solutions where the first step may actually look like a backwards step, or at least it may look a little “round-about” or not “optimal”. If you’re always looking in the “as the crow flies” direction, you may be missing some potential routes up the mountain that are easier[16] to traverse.

  • If you’re working on an algorithm, try to first get a sense of how computationally expensive it should be using Fermi-type approximations from a couple of different perspectives. If you’re accidentally underestimating the cost, you could be dismissing solutions that seem way too expensive when in actual fact, that’s just what’s required for the problem.

  • Lying down, closing your eyes, and relaxing for a bit can be helpful when you feel a bit muddled. I tend to relax faster by taking deep, slow breaths, holding for a moment, and feeling my body sink very slightly into the bed/​couch as I exhale (I’m not actually sinking, but kind of imagining that helps a bit for some reason).

  • In the section called “The big picture” I mention that two or more problems can be related, and that considering them together can help you triangulate on the “real” underlying problem, but sometimes two problems or “weirdnesses” with your model can actually “cancel out”. I.e. it looks like your model has a problem, but it’s actually not a problem due to another “problem” that (in some sense) “balances” it. So be sure to keep track of all the weirdnesses/​problems that your solution currently has, and check in on that list every now and then.

  • Be careful that you’re not falling into the trap of trying to work out an absolute value when in fact you only needed a relative one due to constraints/​assumptions that you didn’t realise that you could use.

  • Watch out for sneaky conflicts between pairs of assumptions/​constraints that you’re working on. You don’t realise that you’re working with an impossible situation and so you get sucked into a confusing mess until you finally realise that two of your assumptions are completely incompatible.

  • Seek similar known/​solved problems. Claude Shannon: “You have a problem P here and there is a solution S which you do not know yet perhaps over here. If you have experience in the field represented, that you are working in, you may perhaps know of a somewhat similar problem, call it P’, which has already been solved and which has a solution, S’, all you need to do—all you may have to do is find the analogy from P’ here to P and the same analogy from S’ to S in order to get back to the solution of the given problem. [...] if you are experienced in a field, you will know thousands of problems that have been solved. Your mental matrix will be filled with P’s and S’s unconnected here and you can find one which is tolerably close to the P that you are trying to solve and go over to the corresponding S’ in order to go back to the S you’re after.”

  • Not for everyone, but worth a try to see if it works for you: I tend to work best during long bouts of “hermit mode”, where I have basically no access to the internet (other than essentials like checking for critical emails) or to the outside world in general. I sometimes have to set strict meal times (no snacks) so I don’t subconsciously wander over to the fridge all the time. For the first day or two there’s a kind of “hangover” (because all my “easy dopamine” sources are gone), and then I can start to think more clearly. Not entirely relevant, but I’ve also noticed that I become more “nerdy” once I’m past the hangover stage. Things that would normally seem a bit boring to me start to become more interesting.

  • Very specific to certain problems, but: Try “unfolding” a temporal dimension into a spatial one to help you visualise what’s going on. Time-series plots are an obvious example of this, but there are less-obvious cases where the technique is useful—e.g. recurrent neural nets are often visualised in a non-recurrent manner by duplicating the network for each time-step. If you have a nested data structure, then you’ll likely want to use some sort of tree-like format rather than visualising things being “inside” other things (this sounds obvious, but there are times when it hasn’t been for me). Recursion and nesting are nice and elegant in math/​code, but they can be hard for the brain to reason about, and often there are useful perspectives to be found by “laying out” or “unfolding” the structure or algorithm.

  • I don’t think deadlines work for most types of “hard” problem solving—you probably want to use “milestones”/​”waypoints” to guide your progress instead.

  • (This is not medical advice, talk to your doctor first, or whatever:) Caffeine, Modafinil, and other easy-to-obtain, low-side-effect “concentration” drugs may be useful to you, depending on your tendency to lose focus, amongst other factors. I don’t drink coffee daily, but I use it every now and then when I need to get some “momentum”—either to get started on something, or to get out of a rut. (There should probably be a whole section in this post about how important “momentum” is for motivation.)

  • Very specific to certain problems, but: It can be appealing to design a system using many small/​”atomic” units, from which the desired behavior “emerges”, but just remember that this approach is making some trade-offs that don’t make sense in all cases. It seems elegant, but I think this elegance can lead you astray—the emergence isn’t “free”. Consider whether a “large atomic units”/​heterogeneous approach (where the units more “directly” create the desired system behavior) makes more sense for your requirements.

  • Be prepared to temporarily put the problem aside if you’re waaay over budget, time-wise. You may have bitten off more than you can chew, and there’s no shame in moving on to other things for a while. Working on a single problem for a long time will yield diminishing returns for your learning (which is probably what you should be optimising for, long-term). If you decide to come back to it later you’ll probably be better prepared to solve it thanks to your fresh mind and new mental models that you’ve since developed.

  • Learn about the important ideas/​models that have emerged in other fields. And not just the sciences—for example, the “composition vs inheritance” design choice that is talked about in game development has made its way into my thinking on topics that aren’t otherwise related to game dev.

Things to read/​watch

  • So You Want to Be a Research Scientist: Still useful to read, even if you, like me, aren’t actually a professional/​full-time researcher.

  • You and Your Research (video): A well-known lecture by Richard Hamming with some good ideas throughout. Mainly aimed at academics, but, again, definitely worth reading even if you’re not taking that path.

  • More generally, biographies (especially autobiographies) of people who did good work—ideally work that’s somewhat related to the stuff you’re interested in, but there are useful insights/​inspiration to pull from the life of basically anyone who has done good work (they’ll just be a bit more abstract).

  • (This list was going to be longer, I may add to it later if I get around to going through more of my old notes.)

  1. ^

    There is also some straight-up repetition. The original Google Doc was a lot worse (it was basically an append-only log of notes).

  2. ^

    For me, one clue that suggests I need another perspective is when I notice myself trying to lean on math/​equations too much—because the perspective /​ mental model that I have isn’t giving me a deep/​intuitive understanding.

  3. ^

    Of course, the jigsaw puzzle analogy isn’t perfect. For a lot of problems, it’s probably more like a stack of puzzles with increasing resolution (more pieces).

  4. ^

    It’s “safe” to anchor on an idea even if you’re not 100% sure it’s “correct”. If it’s not correct, then you’ll eventually find out; the process of anchoring on successive implications will eventually lead you to some sort of contradiction or obviously-incorrect implication.

  5. ^

    But note: There may be multiple possible “branches” to resolving your dead end—i.e. multiple things that don’t seem “right”. Make sure you don’t stubbornly over-focus on just one of them, since they may each be a different “porthole” through which you can look at the same underlying problem. I talk about this particular failure mode in the “big picture” section.

  6. ^

    Many of our mental models /​ intuitions are unnecessarily “siloed” into the problem-space that we learned them in. There are some really useful mental models that we learn from various parts of the world that hide an abstract model which is useful across a large range of other areas.

  7. ^

    E.g. “NewProduct is very similar to OldProduct, which failed, so NewProduct will fail.” or “The policy worked fine in StateA, so it will work fine in neighboring StateB.”

  8. ^

    In problem solving, you’re often the coder and the compiler. (But, of course, programming often involves a lot of high-level problem solving too.)

  9. ^

    It’s also possible that one of your assumptions is wrong, or that the special case is in some sense “misleading” you (e.g. see the section titled “beware of optimising for edge-case singularities”)

  10. ^

    I don’t want to give the impression that problem solving is meant to be fun 100% of the time. On net it can be fairly grueling for me, but there’s some underlying “joy” there—like when climbing a mountain. If you derive no pleasure from your problem then you’ll move more slowly.

  11. ^

    Your brain already does this “reward tracing” subconsciously (for the most part), but when you’ve had your head deep in your problem for a long time you might need to take the time to consciously remind yourself why it’s important to solve this.

  12. ^

    And, having mentioned burnout, I’ll just add: If you’ve been working on the problem for a long time and derived a lot of meaning from it, burnout can compound into (some degree of) depression if you’re not careful. You can feel aimless and unsure of whether to press on with it despite the slower progress due to the limited dopamine/​reward. To avoid this, I think it’s important to take some comfort in the fact that you’ve likely learned many (not-very-legible) lessons throughout the process, and to realise that if you shelve the problem for a year or two you’ll most likely lose no progress (assuming you kept notes summarising your ideas, of course), and will in fact probably be much better prepared to solve it at that point, owing to your fresh mind, and hopefully having worked in a tangentially related area in the meantime which will have given you new mental models that might be relevant to the problem. And assuming you do get back into it, I think it’s a good idea to just “poke” at the problem here and there until it “draws you in” /​ “nerd snipes” you, rather than launching into solving it in a very “ceremonial” way.

  13. ^

    For certain types of problems, there’s a interesting sense in which everything is “just an optimisation”, and so there’s a deeper discussion to be had here, but for now I think it’s enough to say that you should just make sure you’re quite confident in lower-level principles/​mechanics before trying to add higher-level/​second-order stuff on top that improve performance/​accuracy/​etc.

  14. ^

    I think this means that you’ve built your abstraction “tower” high enough (and therefore well enough, assuming you’re not fooling yourself) that you’re able to fit it all in your head and move smoothly up and down the levels, and “play” with a mental interactive animation of the relevant components at each level. That’s vaguely what it feels like for me.

  15. ^

    Having to concretely write them down “in real time” feels less like simply “recording” my thoughts, and more like an extra constraint on them—it’s harder to fool yourself if you have to put your thoughts “concretely” down on the page. Aside: One big disadvantage of computers is that it’s much harder to quickly create a diagram.

  16. ^

    This is probably stretching the mountain analogy a bit, but I think it is technically possible to just “head straight up the mountain” in a straight line—it’ll just be really, really hard. In my experience I very quickly get waist-deep in math (which is too deep for me) because I haven’t spent the effort to put the problem into a form which is amenable to my very human brain. There are “easy” paths up the mountain and you should definitely step back and find them rather than trying to “brute forcing” your way up.