TDT for Humans

alkjash28 Feb 2018 5:40 UTC

26 points

Exercises / Problem-Sets Decision Theory Subagents Practical Planning & Decision-Making

This is part 19 of 30 of Hammertime. Click here for the intro.

As is Hammertime tradition, I’m making a slight change of plans right around the scheduled time for Planning. My excuse this time:

Several commenters pointed out serious gaps in my knowledge of Focusing. I will postpone Internal Double Crux, an advanced form of Focusing, to the next cycle. Instead, we will have two more posts on making and executing long-term plans.

Day 19: TDT for Humans

Previously on planning: Day 8, Day 9, Day 10.

Today I’d like to describe two orders of approximation to a working decision theory for humans.

TDT 101

Background reading: How I Lost 100 Pounds Using TDT.

Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.

~ Eliezer

In other words, every time you make a decision, pre-commit to making the same decision in all conceptually similar situations in the future.

The striking value of TDT is: make each decision as if you would immediately reap the long-term rewards of making that same decision repeatedly. And if it turns out you’re an updateless agent, this actually works! You actually lose 100 pounds by making one decision.

I encourage readers who have not tried to live by TDT to stop here and try it out for a week.

TDT 201

There are a number of serious differences between timeless agents and human beings, so applying TDT as stated above requires an unacceptable (to me) level of self-deception. My second order of approximation is to offer a practical and weak version of TDT based on the Solitaire Principle and Magic Brain Juice.

Three objections to applying TDT in real life:

Spirits

A human is about halfway between “one monolithic codebase” and “a loose confederation of spirits running a random serial dictatorship.” Roughly speaking, each spirit is the piece of you built to satisfy one primordial need: hunger, friendship, curiosity, justice. At any given time, only one or two of these spirits is present and making decisions. As such, even if each individual spirit as updateless and deterministic, you don’t get to make decisions for all the spirits currently inactive. You don’t have as much control over the other spirits as you would like.

Different spirits have access to different data and beliefs. I’ve mentioned, for example, that I have different personalities speaking Chinese and English. You can ask me what my favorite food is in English, and I’ll say dumplings, but the true answer 饺子 feels qualitatively better than dumplings by a wide margin.

Different spirits have different values. I have two friends who reliably provoke my “sadistic dick-measuring asshole” spirit. If human beings really have utility functions this spirit has negative signs in front of the terms for other people. It’s uncharacteristically happy to engage in negative-sum games.

It’s almost impossible to predict when spirits will manifest. Recently, I was on a 13-hour flight back from China. I started marathoning Game of Thrones after exhausting the comedy section, and a full season of Cersei Lannister left me in “sadistic asshole” mode for a full day afterwards. If Hainan Airlines had stocked more comedy movies this might not have occurred.

Spirits can lay dormant for months or years. Meeting up with high school friends this December, I fell into old roles and received effortless access to a large swathe of faded memories.

Conceptual Gerrymandering

Background reading: conceptual gerrymandering.

I can make a problem look either big or small by drawing either a big or small conceptual boundary around it, then identifying my problem with the conceptual boundary I’ve drawn.

TDT runs on an ambiguous “conceptual similarity” clause: you pre-commit to making the same decision in conceptually similar situations. Unfortunately, you will be prone to motivated reasoning and conceptual gerrymandering to get out of timeless pre-commitments made in the past.

This problem can be reduced but not solved by clearly stating boundaries. Life is too high-dimensional to even figure out what variables to care about, let alone where to draw the line for each of them. What information becomes salient is a function of your attention and noticing skills as much as of reality itself. These days, it’s almost a routine experience to read an article that sufficiently alters my capacities for attention as to render situations I would previously have considered “conceptually similar” altogether distinct.

Magic Brain Juice

Background reading: Magic Brain Juice.

Every action you take is accompanied by an unintentional self-modification.

The human brain is finicky code that self-modifies every time it takes an action. The situation is even worse than this: your actions can shift your very values in surprising and illegible ways. This bug is an inherent contradiction to applying TDT as a human.

Self-modification happens in multiple ways. When I wrote Magic Brain Juice, I was referring to the immediate strengthening of neural pathways that are activated, and the corresponding decay through time of all pathways not activated. But other things happen under the hood. You get attached to a certain identity. You get sucked into the nearest attractor in the social web. And also:

Exposure therapy is a powerful and indiscriminate tool. You can reduce any aversion to almost zero just by voluntarily confronting it repeatedly. But you have fears and aversions in every direction!

Every move you make is exposure therapy in that direction.

That’s right.

Every voluntary decision nudges your comfort zone in that direction, squashing aversions (endorsed or otherwise) in its path.

Oops!

Solutions

I hope I’ve convinced you that the human brain is sufficiently broken that our intuition about “updateless source code” don’t apply, and trying to make decisions from TDT will be harder (and may have serious unintended side effects) as a result. What can be done?

First, I think it’s worth directly investing in TDT-like behaviors. Make conscious decisions to reinforce the spirits that are amenable to making and keeping pre-commitments. Make more legible decisions and clearly state conceptual boundaries. Explore virtue ethics or deontology. Zvi’s blog is a good a place.

In the same vein, practice predicting your future behavior. If you can become your own Omega, problems you face start looking Newcomb-like. Then you’ll be forced to give up CDT and the failures it entail.

Second, I once proposed a model called the “Ten Percent Shift”:

The Ten Percent Shift is a thought experiment I’ve successfully pushed to System 1 that helps build long-term habits like blogging every day. It makes the assumption that each time you make a choice, it gets 10% easier.

Suppose there is a habit you want to build such as going to the gym. You’ve drawn the pentagrams, sprinkled the pixie dust, and done the proper rituals to decide that the benefits clearly outweigh the costs and there’s no superior alternatives. Nevertheless, the effort to make yourself go every day seems insurmountable.

You spend 100 units of willpower dragging yourself there on Day 1. Now, notice that you have magic brain juice on your side. On Day 2, it gets a little bit easier. You spend 90 units. On Day 3, it only costs 80.

A bit of math and a lot of magic brain juice later, you spend 500 units of willpower in the first 10 days, and the habit is free for the rest of time.

The exact number is irrelevant, but I stand by this model as the proper weakening of TDT: act as if each single decision rewards you with 10% of the value of making that same decision indefinitely. One decision only loses you 10 pounds, and you need to make 10 consecutive decisions before you get to reap the full rewards.

The Ten Percent Shift guards against spirits. Once you make the same decision 10 times in a row, you’ll have made it from a wide range of states of mind, and the exact context will have differed in every situation. You’ll probably have to convince a majority of spirits to agree with making the decision.

The Ten Percent Shift also guards against conceptual gerrymandering. Having made the same decision from a bunch of different situations, the convex hull of these data points is a 10-dimensional convex region that you can unambiguously stake out as a timeless pre-commitment.

Daily Challenge

This post is extremely tentative and theoretical, so I’ll just open up the floor for discussion.

alkjash28 Feb 2018 5:40 UTC

26 points

7 comments5 min readLW link

Exercises / Problem-Sets Decision Theory Subagents Practical Planning & Decision-Making

Qiaochu_Yuan 28 Feb 2018 6:28 UTC
9 points
In other words, every time you make a decision, pre-commit to making the same decision in all conceptually similar situations in the future.
That’s not quite how I would put it. I think the basic insight is: notice that you already will make the same decision in all similar situations in the future, and weigh the cost and benefits of doing that. (Especially striking in examples like dieting: you’re not just eating the extra fry now, you’re eating all of the extra fries that you’ll eat from being the sort of person who eats extra fries in situations like now.)
I think all of the modifications you propose are reasonable.
- alkjash 28 Feb 2018 15:59 UTC
  9 points
  Parent
  I almost agree with that formulation better. The problem which I wanted to highlight with the wording—although perhaps the wording I made is not the most effective way to do so—is that there’s something inherently dissimilar about the current situation: you’re making a conscious decision about it. In a world where you’re trying to diet, you still make conscious decisions only a small fraction of the time you eat food.
  So your TDT calculation needs to take as input the fact that you currently have the slack/willpower to think about it, and your attention has been drawn to the decision for the particular thing. And this becomes far less generalizable.
tcheasdfjkl 17 Aug 2018 0:25 UTC
6 points
One issue with the TDT framework for ordinary decisions/actions is that “every time you take an action (or don’t), you become more the kind of person who takes (or doesn’t take) that action” can put a huge weight on every action in a way that can become psychologically crushing. Decisions become agonizingly hard; one suboptimal action can cause a spiral of shame and discouragement and pessimism that makes it harder to do things in the future because you’ve already updated so much towards you being incapable of doing things (in addition to just being extremely unpleasant). If not going to the gym is an indication that you are the kind of person who doesn’t go to the gym, then if one time you fail to go to the gym you think of yourself even more as the kind of person who doesn’t go to the gym, which can become part of your identity/how you think of yourself, which can make it even harder to change (and since many people imbue going to the gym with moral valence, this can cash out to “I am a bad/useless person”).
A lot of people need to learn the opposite of TDT first, if they’re stuck in that kind of spiral. One instance of going to the gym is one instance of going to the gym, missing this one time doesn’t mean you’re doomed to never make it to the gym again, you are still a person with potential even if you didn’t use this one afternoon in a particularly effective way. (This is basically unlearning the all-or-nothing thinking that is often a big part of depression reasoning.)
At the same time, yes, the things you say here are true; and truly unlearning all-or-nothing thinking means learning this while also not imbuing your actions with more power than they actually have. What you do today doesn’t determine what you will do for all time going forward, but it does have some effect on your future actions, and as such what you do today is one possible intervention point for changing what your future life will look like.
I guess the only change I’d make to how people talk about TDT for humans is to not make claims that are stronger than reality. “If you make this decision today, you will also make the same decision in all analogous situations” is not actually literally true. “If you make this decision today, you make it more likely that you will also make the same decision in future analogous situations” is truer and less fatalistic.
theme_arrow 24 Jan 2021 23:59 UTC
3 points
I like your weakened version of TDT, it feels like it does really capture something salient about human decision-making. I recognize that the exact number isn’t really important, but I think I’d describe it as close to a one-percent shift than a ten-percent shift for myself. I feel like I personally have taken a very long time to go from the first few times I do something to that thing feeling natural. I wonder if that’s something that tends to differ a lot between people or different kinds of actions.
tcheasdfjkl 17 Aug 2018 18:58 UTC
3 points
Oh, another thought—it’s only mostly true that taking an action is exposure therapy towards that action being more okay/likely in the future. If you take an action and it results in a sharply bad experience for you, it may make you less likely to take that action in the future. That’s why comfort zones are best expanded carefully, because if you go out on too much of a limb and hurt yourself, your comfort zone can shrink instead.
tcheasdfjkl 17 Aug 2018 1:25 UTC
3 points
There was no specific exercise to go with this entry, but I decided to use this as a prompt to take some time to think about what helpful and unhelpful “I am the kind of person who [...]” self-images I have, whether I have already done some things to challenge the unhelpful ones (and try updating on that), and what further actions I could take to challenge the unhelpful ones and strengthen the helpful ones. I also thought about what helpful self-images of this sort I don’t quite already have but aren’t counter to my current ones and seem within reach, and how I might make them true.
(I know this post doesn’t really focus on self-image and is more about direct relationships between past and future action—but I do think that self-image is one of the mechanisms that fuel that relationship, in addition to directly affecting people’s well-being, so it’s worth spending some thought on. I did find this a useful line of inquiry.)
itavero 11 Dec 2020 0:46 UTC
1 point
Hitting the “0” willpower level might be when something becomes part of your identity. It costs me willpower to not do cardio or lift for more than a day or so—something feels off and aversive about it, just like it used to in the reverse situation. The only downside is that I’m not very excited or proud of myself for doing a workout, it’s just “normal”. But that’s life! I’m still glad I do it.
There are a few habits that I seem to drop once they get easy to do—I did yoga every day for a while but once it got easy and habitual I got bored and stopped. In retrospect I should’ve found a way to raise the difficulty, but for some reason that wasn’t clear at the time. Consciously planning around “what should I do if this gets boring” would probably help.