Backchaining causes wishful thinking

Wishful thinking—believing things that make you happy—may be a result of adapting an old cognitive mechanism to new content.

Obvious, well-known stuff

The world is a complicated place. When we first arrive, we don’t understand it at all; we can’t even recognize objects or move our arms and legs reliably. Gradually, we make sense of it by building categories of perceptions and objects and events and feelings that resemble each other. Then, instead of processing every detail of a new situation, we just have to decide which category it’s closest to, and what we do with things in that category. Most, possibly all, categories can be built using unsupervised learning, just by noting statistical regularities and clustering.

If we want to be more than finite-state automata, we also need to learn how to notice which things and events might be useful or dangerous, and make predictions, and form plans. There are logic-based ways of doing this, and there are also statistical methods. There’s good evidence that the human dopaminergic system uses one of these statistical methods, temporal difference learning (TD). TD is a backchaining method: First it learns what state or action Gn-1 usually comes just before reaching a goal Gn, and then what Gn-2 usually comes just before Gn-1, etc. Many other learning methods use backchaining, including backpropagation, bucket brigade, and spreading activation. These learning methods need a label or signal, during or after some series of events, saying whether the result was good or bad.

I don’t know why we have consciousness, and I don’t know what determines which kinds of learning require conscious attention. For those that do, the signals produce some variety of pleasure or pain. We learn to pay attention to things associated with pleasure or pain, and for planning, we may use TD to build something analogous to a Markov process (sorry, I found no good link; and Wikipedia’s entry on Markov chain is not what you want) where, given a sequence of the previous n states or actions (A1, A2, … An), the probability of taking action A is proportional to the expected (pleasure—pain) for the sequence (A1, … An, A). In short, we learn to do things that make us feel better.

Less-obvious stuff

Here’s a key point which is overlooked (or specifically denied) by most AI architectures: Believing is an action. Building an inference chain is not just like constructing a plan; it’s the same thing, probably done by the same algorithm. Constructing a plan includes inferential steps, and inference often inserts action steps to make observations and reduce our uncertainty.

Actions, including the “believe” action, have preconditions. When building a plan, you need to find actions that achieve those preconditions. You don’t need to look for things that defeat them. With actions, this isn’t much of a problem, because actions are pretty reliable. If you put a rock in the fire, you don’t need to weigh the evidence for and against the proposition that the rock is now in the fire. If you put a stick in a termite mound, it may or may not come out covered in termites. You don’t need to compute the odds that the stick was inserted correctly, or the expected number of termites; you pull it out and look at the stick. If you can find things that cause it not to be covered in termites, such as being the wrong sort of stick, it’s probably a simple enough cause that you can enumerate it in your preconditions for next time.

You don’t need to consider all the ways that your actions could be thwarted until you start doing adversarial planning, which can’t happen until you’ve already started incorporating belief actions into your planning. (A tiger needs to consider which ways a wildebeest might run to avoid it, but probably doesn’t need to model the wildebeest’s beliefs and use min-max—at least, not to any significant depth. Some mammals do some adversarial planning and modelling of belief states; I wouldn’t be surprised if squirrels avoid burying their nuts when other squirrels are looking. But the domains and actors are simpler, so the process shouldn’t break down as often as it does in humans.)

When we evolved the ability to make extensive use of belief actions, we probably took our existing plan-construction mechanism, and added belief actions. But an inference is a lot less certain than an action. You’re allowed to insert a “believe” act into your plan if you’re able to find just one thing, belief or action, that plausibly satisfies its preconditions. You’re not required to spend any time looking for things that refute that belief. Your mind doesn’t know that beliefs are fundamentally different from actions, in that the truth-values of the propositions describing the expected effects of your possible actions are strongly, causally correlated with whether you execute the action; while the truth-values of your possible belief-actions are not, and can be made true or false by many other factors.

You can string a long series of actions together into a plan. If an action fails, you’ll usually notice, and you can stop and retry or replan. Similarly, you can string a long series of belief actions together, even if the probability of each one is only a little above .5, and your planning algorithm won’t complain, because stringing a long series of actions together has worked pretty well in your evolutionary past. But you don’t usually get immediate feedback after believing something that tells you whether believing “succeeded” (deposited something in your mind that successfully matches the real world); so it doesn’t work.

The old way of backchaining, by just trying to satisfy preconditions, doesn’t work well with our new mental content. But we haven’t evolved anything better yet. If we had, chess would seem easy.

Summary

Wishful thinking is a state-space-reduction heuristic. Your ancestors’ minds searched for actions that would enable actions that would make them feel good. Your mind, therefore, searches for beliefs that will enable beliefs that will make you feel good. It doesn’t search for beliefs that will refute them.

(A forward-chaining planner wouldn’t suffer this bias. It probably wouldn’t get anything done, either, as its search space would be vast.)