Thoughts on designing policies for oneself

Note: This was originally written in relation to this rather scary comment of lukeprog’s on value drift. I’m now less certain that operant conditioning is a significant cause of value drift (leaning towards near/​far type explanations), but I decided to share my thoughts on the topic of policy design anyway.


Several years ago, I had a reddit problem. I’d check reddit instead of working on important stuff. The more I browsed the site, the shorter my attention span got. The shorter my attention span got, the harder it was for me to find things that were enjoyable to read. Instead of being rejuvenating, I found reddit to be addictive, unsatisfying, and frustrating. Every time I thought to myself that I really should stop, there was always just one more thing to click on.

So I installed LeechBlock and blocked reddit at all hours. That worked really well… for a while.

Occasionally I wanted to dig up something I remembered seeing on reddit. (This wasn’t always bad—in some cases I was looking up something related to stuff I was working on.) I tried a few different policies for dealing with this. All of them basically amounted to inconveniencing myself in some way or another whenever I wanted to dig something up.

After a few weeks, I no longer felt the urge to check reddit compulsively. And after a few months, I hardly even remembered what it was like to be an addict.

However, my inconvenience barriers were still present, and they were, well, inconvenient. It really was pretty annoying to make an entry in my notebook describing what I was visiting for and start up a different browser just to check something. I figured I could always turn LeechBlock on again if necessary, so I removed my self-imposed barriers. And slid back in to addiction.

After a while, I got sick of being addicted again and decided to do something about it (again). Interestingly, I forgot my earlier thought that I could just turn LeechBlock on again easily. Instead, thinking about LeechBlock made me feel hopeless because it seemed like it ultimately hadn’t worked. But I did try it again, and the entire cycle then finished repeating itself: I got un-addicted, I removed LeechBlock, I got re-addicted.

This may seem like a surprising lack of self-awareness. All I can say is: Every second my brain gathers tons of sensory data and discards the vast majority of it. Narratives like the one you’re reading right now don’t get constructed on the fly automatically. Maybe if I had been following orthonormal’s advice of keeping and monitoring a record of life changes attempted, I would’ve thought to try something different.

Anyway, what finally worked was setting up a site blocker that blocked the reddit.com homepage only. There was no inconvenience associated with visiting other pages, so the “willpower upkeep cost” of this policy was pretty minimal. I drew a mental “line in the sand” prohibiting me from ever loading a web page just to see what had changed on it (excluding email and some other stuff), and this rough heuristic (which I’ve safely gotten informal with) has served me well ever since.

The point of this anecdote is: Having well-designed policies matters. In the same way that the laws of a nation or the rules of a board game are very important, the policies you set up for yourself to follow are very important. (“Consequentialism is what’s correct; virtue ethics is what works for humans.”)

You might be wondering how I read Less Wrong, since it’s a web page that changes. Less Wrong is a tough one, because it’s got the variable reinforcement that makes reddit addictive, but hanging out here can also be a pretty good use of time. Lately what I’ve been doing is using Google Reader as one of my go-to break activities, and stuffing it so full of feeds that there’s a growing backlog of interesting stuff to read every time I visit. The idea here is to have a constant reinforcer instead of a variable one, and it seems to work as far as avoiding addiction is concerned.

Policy design tips

My reddit experience illustrates a few recommendations for designing policies:

  • Make the willpower upkeep cost of your policy as low as possible. The higher the upkeep cost, the greater the chance that you’ll lack the willpower to uphold it at some point, and thereby lose the cognitive momentum you’ve got behind it. Willpower spent on upkeep is also willpower that can’t be spent on other stuff. (Yes, I know that the resource of model of willpower isn’t perfect, but it seems pretty descriptive in my case and I haven’t figured out how to subvert it significantly. If you have, please write a post about it!) I think this is the reason why the “cheat days” in diets like Tim Ferris’ work—eating whatever you want one day per week decreases the diet’s willpower upkeep from impossible to bearable.

  • Don’t berate yourself, debug your policies. In the same way having your program work correctly on the first run is not the default, successfully acting like an agent on the first try is not the default. Just like in programming, you should expect to fix some bugs before getting something that works. (Another way to think about it: You’re trying to build a multi-story structure on a planet with extremely high gravity. The high gravity represents the strength of your instincts to just do whatever feels good and doesn’t feel bad. Your first few structures fall down, but eventually you manage to figure out a blueprint for a structure that doesn’t fall down. Even if falls later, that doesn’t matter too much ’cause you can just rebuild it, probably with a few improvements.)

  • If possible, have a clear line between policy-compliant and policy-breaking behaviour, to guard against slippery slopes.

The rest of this post is going to consist of more policy design advice. I don’t remember the policy design attempts that spawned each piece of advice, and my advice may not work for you. But hopefully it will make for a good starting point for your own policy design experiments.

Consistency

An overarching principle: As much as possible, you want to there to be consistency between what you tell yourself to do and what you actually do. If you’ve been telling yourself to do something and it’s not working, stop. Step back, gain some self-awareness, get creative, and try to figure out some other way to modify your behaviour.

Why is this so important? Because ignoring what you tell yourself to do is a really bad habit. Let’s say I’m trying to lose weight. Every morning I tell myself that I shouldn’t eat a cookie at lunch, and every afternoon I give in and eat a cookie. This amounts to reinforcing the behaviour of rationalizing my way around my diet! The more times I rationalize my way around my diet and get rewarded with a tasty cookie, the stronger my habit of breaking diets is going to become. It might even be a good idea for me to stop trying to diet completely for a while until the behaviour of rationalizing my way around my diet dies off.

I also think the game-theoretic view of time-inconsistency is useful. If you build up a track record of self-cooperation, following pre-commitments becomes easier because you know that by breaking the pre-commitment, you’ll be destroying something valuable. Part of this is not making excessive demands on yourself so that track record can actually be built up in the first place. See also: How I Lost 100 Pounds Using TDT.

If you keep these arguments in mind when your brain starts making “just this once” type arguments, hopefully you’ll be better at resisting them.

Translating guilt in to policy ideas

  • Instead of feeling guilty about something you find yourself doing, think “what policy should I make”? Then continue doing the activity guiltlessly and implement the policy when you’re feeling energetic later. (Implementing the policy later means you’ll be less likely to break it right after having made it.) Untargeted guilt isn’t that useful, and it’s especially useless if you’re going to do the behaviour anyway. It’s much better to translate your vague suggestions for yourself in to a set of specific guidelines, and then put enough momentum behind those guidelines that they don’t take much willpower to follow.

  • For me, there seems to be a very strong effect where if I make a policy when I’m not feeling very high-willpower, I won’t take it seriously and will ignore it later on. So I recommend just noting down policy ideas if you’re feeling tired. Then you can refine them and commit to actually following them later on.

Refining policy ideas

(Suggestion: Refer back to this list when you’re in a high-energy state and you’ve got a policy you want to implement.)

  • You’re encouraged to spend a while on thinking about implementation details if the policy is important to you. The longer you spend thinking about and refining your policy, the more cognitive momentum you’ll have behind it.

  • Keep willpower upkeep low, as mentioned above.

  • Do brainstorming in a text document and finish with a written description of your policy.

  • As you think about your policy, brainstorm a to-do list of ways you could modify your environment to make following your policy easier (ex.: throw out your cigarettes, tape reminders on stuff).

  • If you don’t remember that you made a policy, then violating it probably shouldn’t count as a “real” violation. Your memory isn’t perfect, why try to affect what you can’t control?

  • For each policy you make for yourself, I recommend giving yourself two ways to change it. The “slow” way: make a change and it comes in to effect X hours later. The “fast” way: think of a change, think as hard as you can for Y minutes about why it may be a bad change, and if it would seem like a good change to the self that made the policy at the end of Y minutes, make the change. You’ll have to decide on X and Y when you make the policy. Policy changes should be reflected in your text document. It may be a good idea to have a “dry run” for a few days with Y = 0 or something like that.

  • It also may be a good idea to have two standards for yourself: a carefully-defined “formal” standard, and a higher “informal” standard that isn’t as rigorously specified. Try to anchor on your informal standard and follow it in practice, but count it as a win every time you do better than your formal standard. (Think of your formal standard as being at zero, and your informal standard as being at some positive number. Ideally you should have periodic feelings of pride for beating your formal standard, reinforcing the behaviour of following your policy.)

Tips on repairing broken policies

Hopefully this won’t actually happen, but let’s say you broke your policy. What now?

  • Hold the line. Decide that whatever you did to break the policy is allowed for now, but keep the rest of the policy intact.

  • Some point later on, when you’re feeling energetic, restore your original policy and try to improve it to prevent the particular failure mode you encountered.

I’ve additionally found regular meditation to be useful for maintaining policies. (Sam Harris on meditation.)

Conclusion

It’s been over 6 months since I wrote this article. Here’s what my internet distraction policy has evolved in to (it’s been stable for the past few months at least, so I thought it might be worth sharing). I have a list of websites that I’ve classified as “distracting”, which include reddit, Less Wrong, and Facebook, but not my email (it’s too useful to restrict and I’ve been able to live with having that one distraction). If I have a reason to visit one of the webpages, I create a log entry in my notebook explaining the reason and then go visit. Sometimes the reason is just “I could use a break right now”, and so far using this reason hasn’t caused any problems. (If it did, I would probably have to change my policy and hammer out what constitutes a valid reason.) I also open all of the distracting websites on my list in tabs after 11 PM (after a one-minute delay) most days, which means I regularly check my LW/​reddit/​email inbox and don’t have to worry about missing important things in my inbox. For Hacker News in particular, I came up with a more unusual solution: I have a server that’s set up to spider the HN homepage every half hour. I originally did this with the intent to write a software tool to browse the homepage archives and filter out all but the best content, but so far I haven’t gotten around to this.