Reinforcement and Short-Term Rewards as Anti-Akratic

Related: Time and Effort Discounting, Akrasia, Hyperbolic Discounting, and Picoeconomics, The Power of Reinforcement, Basics of Animal Reinforcement, Basics of Human Reinforcement

I built a robot that feeds me candy when I get work done, to try to solve my akrasia problem. And, so far, it seems like it might actually work.

Naturally, the story starts with procrastination. I finish things the night before they’re due. Or, sometimes, I don’t. I’d like to fix that. One theory explains procrastination as a result of discounting, the idea that human brains discount long-term rewards in favor of short-term ones. For instance, my brain prefers watching Neon Genesis Evangelion now over nearly missing my project deadline in a few days. The same principle applies to consequences, and there are already tools like BeeMinder that are built to combat it. Its tagline, “bring long-term consequences near,” is a very concise description of a clever way to short-circuit discounting. It’s very interesting, but I’m not really comfortable with paying money as a consequence. Instead, I’m going to try a similar technique: bringing long-term rewards near.

There are already a lot of techniques about bringing long-term rewards near. Generally, they’re called reinforcement learning. The classic reward in reinforcement is candy, which seems like a good idea: I like it, and I’m more than willing to abuse my youthful metabolism for productivity. And, in fact, there are a wide variety of folk solutions of that sort—advice to reward yourself with some candy once your work is done. I’ve tried those already, but they never seem to work out for me—I always seem to wind up cheating. I need to do something trickier.

CFAR describes reinforcement in a very striking way in some of their course materials: they call it “training your inner pigeon.” Not only is that a nice, snappy turn of phrase, it illustrates the problem with attempting to self-administer rewards very nicely. Did Skinner’s pigeons self-administer their rewards? No, of course they didn’t. I shouldn’t expect my inner pigeon to, either. So, my next step is to build a robot that gives me candy when I get stuff done.

Why do I think I can keep from cheating on the machine, when I couldn’t restrain myself from cheating on regular old bags of candy? Well, I’m far from certain; it’s my biggest worry with the project, in fact. But I am reasonably confident, because the machine will give me an easy way to establish a Schelling fence. Where taking a handful of candy out of the bag is sometimes right and sometimes wrong, taking a handful of candy out of the hopper is always wrong, since the machine will dispense the candy when I deserve it. Precommitting to never take candy out of the machine seems like it’ll be a lot easier than precommitting to only sometimes take candy out of the bag.

Now, the description “robot” for my machine is a bit fanciful. It’s actually an automatic dog feeder, modified and connected to the Internet. It has a small screen mounted on the front, which tells me how many rewards I’ve earned. If I’ve got any, I can press a button on the screen to dispense them. Not counting parts I already owned, the device cost me around $50 to build. To provide the data, I linked the device to an earlier productivity hack that I already had around, a custom webapp integrating a task list with a Pomodoro timer.

Rewards are given based on a few simple rules. When I finish a task early, it gives me the number of days early in rewards; if I finish tasks out of order, it gives me the nearer task’s number of rewards, so I’ve got an incentive to finish tasks in order. I also get an extra reward for my first Pomodoro in a week for each of my projects, so that I have an incentive not to forget old projects. The system can also take away rewards. If I get distracted during a Pomodoro, I lose a reward. I’m blocked from redeeming rewards if I have a task within a day of its deadline. If I finish a task more than a day late, I lose any rewards in the system.

Results have been mixed so far. My greatest concern seems to have been unjustified: I haven’t cheated on the machine once. However, it seems like the rules need some more work. The system has definitely helped some, but there are a lot of problems that could be improved.

The system doesn’t account for the difficulty of tasks, meaning that I get more reward for less effort if I do easier work. As a result, I’ve done all of the reading up to next Tuesday for my literature class, but my Computer Science assignment due on Friday is unfinished, and my “research” for an exceptionally abhorrent humanities course is languishing on the vine.

The point of the system was to bring long-term rewards near, but there are a lot of circumstances in which it doesn’t seem to bring them quite near enough. For deadlined tasks, I get no rewards until I’ve actually completed the task; if I think a task will take me more than a day to finish, that’s more than a day of work which earns me no short-term rewards. This gets even worse if I happen to have a long task (or, many short tasks) that have reached the day before their deadline. Then, I don’t get any rewards until I finish all of those tasks. While this is quite motivating, it’s still a long-term motivation, i.e. it doesn’t work very well.

I deliberately built the system to encourage doing tasks in order, but this seems to have backfired a little bit. Since I would be giving up rewards, I don’t want to work on a task that’s due later if there’s another that’s due sooner. However, if I really don’t want to do the nearer task, I’ll end up wasting time, since I get no rewards for that either way. Nyan_sandwich describes a similar failure mode in his Akrasia Case Study: if I know I have something more urgent to do, but I don’t want to do it, I wind up procrastinating instead of doing less urgent things.

I get sick of candy more quickly than I expected. The portion my machine emits (about a small handful) tends to stop motivating me after about 4 in a day. Additionally, I seem to be entirely incapable of pacing myself; if the reward is in the system, I tend not to wait very long before using it. This has crippled all of the rules about involving taking away rewards—unless the rewards are blocked, they don’t stick around in the system long enough to be taken away.

Not all of the things I want to change are a result of problems, though. There are a wide variety of interesting improvements I could make. Many of these are expansions: aside from my task list, what else can I connect to? Can I track note-taking in class? Can I set it up to reward continuing effort towards a task, like writing a few hundred words a day? Can I use it to create new, more rational habits? There are all kinds of possibilities to consider. If you’ve got anything you’d like to suggest, let me know—I’m open to anything interesting.

There are also a lot of techniques to research; I’m sure the program isn’t nearly as effective as it could be. Operant conditioning techniques like variable-ratio schedules might help improve performance per candy. Or, I could look into gamification, basically a form of applied human operant conditioning; it’s not a standard tool on the site, but if you’ve ever watched an experience bar rise, you know what I’m talking about. Again, if you happen to have some relevant ideas, let me know.

Obviously, I’m going to be making some rule changes in the near future. Expect another post in a few weeks about what’s changed and how the changes have worked out for me.

Also, does anyone want to help me think of a good name for the system? Right now it’s called the “extrinsic motivator.” While descriptive, this name isn’t snappy at all.