The Power of Reinforcement

Story 1:

On Skype with Eliezer, I said: “Eliezer, you’ve been un­usu­ally pleas­ant these past three weeks. I’m re­ally happy to see that, and more­over, it in­creases my prob­a­bil­ity than an Eliezer-led FAI re­search team will work. What caused this change, do you think?”

Eliezer replied: “Well, three weeks ago I was work­ing with Anna and Ali­corn, and ev­ery time I said some­thing nice they fed me an M&M.”

Story 2:

I once wit­nessed a worker who hated keep­ing a work log be­cause it was only used “against” him. His su­per­vi­sor would call to say “Why did you spend so much time on that?” or “Why isn’t this done yet?” but never “I saw you han­dled X, great job!” Not sur­pris­ingly, he of­ten “for­got” to fill out his work­log.

Ever since I got ev­ery­one at the Sin­gu­lar­ity In­sti­tute to keep work logs, I’ve tried to avoid con­nec­tions be­tween “con­cerned” feed­back and staff work logs, and in­stead take time to com­ment pos­i­tively on things I see in those work logs.

Story 3:

Chat­ting with Eliezer, I said, “Eliezer, I get the sense that I’ve in­ad­ver­tently caused you to be slightly averse to talk­ing to me. Maybe be­cause we dis­agree on so many things, or some­thing?”

Eliezer’s re­ply was: “No, it’s much sim­pler. Our con­ver­sa­tions usu­ally run longer than our pre­vi­ously set dead­line, so when­ever I finish talk­ing with you I feel drained and slightly cranky.”

Now I finish our con­ver­sa­tions on time.

Story 4:

A ma­jor Sin­gu­lar­ity In­sti­tute donor re­cently said to me: “By the way, I de­cided that ev­ery time I donate to the Sin­gu­lar­ity In­sti­tute, I’ll set aside an ad­di­tional 5% for my­self to do fun things with, as a mo­ti­va­tion to donate.”

The power of reinforcement

It’s amaz­ing to me how con­sis­tently we fail to take ad­van­tage of the power of re­in­force­ment.

Maybe it’s be­cause be­hav­iorist tech­niques like re­in­force­ment feel like they don’t re­spect hu­man agency enough. But if you aren’t treat­ing hu­mans more like an­i­mals than most peo­ple are, then you’re mod­el­ing hu­mans poorly.

You are not an agenty ho­muncu­lus “cor­rupted” by heuris­tics and bi­ases. You just are heuris­tics and bi­ases. And you re­spond to re­in­force­ment, be­cause most of your mo­ti­va­tion sys­tems still work like the mo­ti­va­tion sys­tems of other an­i­mals.

A quick re­minder of what you learned in high school

  • A re­in­forcer is any­thing that, when it oc­curs in con­junc­tion with an act, in­creases the prob­a­bil­ity that the act will oc­cur again.

  • A pos­i­tive re­in­forcer is some­thing the sub­ject wants, such as food, pet­ting, or praise. Pos­i­tive re­in­force­ment oc­curs when a tar­get be­hav­ior is fol­lowed by some­thing the sub­ject wants, and this in­creases the prob­a­bil­ity that the be­hav­ior will oc­cur again.

  • A nega­tive re­in­forcer is some­thing the sub­ject wants to avoid, such as a blow, a frown, or an un­pleas­ant sound. Nega­tive re­in­force­ment oc­curs when a tar­get be­hav­ior is fol­lowed by some re­lief from some­thing the sub­ject doesn’t want, and this in­creases the prob­a­bil­ity that the be­hav­ior will hap­pen again.

What works

  1. Small re­in­forcers are fine, as long as there is a strong cor­re­la­tion be­tween the be­hav­ior and the re­in­forcer (Sch­nei­der 1973; Todorov et al. 1984). All else equal, a large re­in­forcer is more effec­tive than a small one (Christo­pher 1988; Lud­vig et al. 2007; Wolfe 1936), but the more you in­crease the re­in­forcer mag­ni­tude, the less benefit you get from the in­crease (Frisch & Dick­in­son 1990).

  2. The re­in­forcer should im­me­di­ately fol­low the tar­get be­hav­ior (Es­co­bar & Bruner 2007; Sch­linger & Blakely 1994; Sch­nei­der 1990). Pryor (2007) notes that when the re­ward is food, small bits (like M&Ms) are best be­cause they can be con­sumed in­stantly in­stead of be­ing con­sumed over an ex­tended pe­riod of time.

  3. Any fea­ture of a be­hav­ior can be strength­ened (e.g., its in­ten­sity, fre­quency, rate, du­ra­tion, per­sis­tence, its shape or form), so long as a re­in­forcer can be made con­tin­gent on that par­tic­u­lar fea­ture (Neur­inger 2002).

Ex­am­ple applications

  • If you want some­one to call you, then when they do call, don’t nag them about how they never call you. In­stead, be en­gag­ing and pos­i­tive.

  • When try­ing to main­tain or­der in a class, ig­nore un­ruly be­hav­ior and praise good be­hav­ior (Mad­sen et al. 1968; McNa­mara 1987).

  • Re­ward origi­nal­ity to en­courage cre­ativity (Pryor et al. 1969; Cham­bers et al. 1977; Eisen­berger & Armeli 1997; Eisen­berger & Rhoades 2001).

  • If you want stu­dents to un­der­stand the ma­te­rial, don’t get ex­cited when they guess the teacher’s pass­word but in­stead when they demon­strate a tech­ni­cal un­der­stand­ing.

  • To help some­one im­prove at dance or sport, ig­nore poor perfor­mance but re­ward good perfor­mance im­me­di­ately, for ex­am­ple by shout­ing “Good!” (Buzas & Allyon 1981) The rea­son you should ig­nore poor perfor­mance if you say “No, you’re do­ing it wrong!” you are in­ad­ver­tently pun­ish­ing the effort. A bet­ter re­sponse to a mis­take would be to re­in­force the effort: “Good effort! You’re al­most there! Try once more.”

  • Re­ward hon­esty to help peo­ple be more hon­est with you (Lanza et al 1982).

  • Re­ward opinion-ex­press­ing to get peo­ple to ex­press their opinions more of­ten (Ver­planck 1955).

  • You may even be able to re­in­force-away an­noy­ing in­vol­un­tary be­hav­iors, such as twitches (Lau­renti-Lions et al. 1985) or vom­it­ing (Wolf et al. 1965).

  • Want a young in­fant to learn to speak more quickly? Re­in­force their at­tempts at vo­cal­iza­tion (Ramely & Finkel­stein 1978).

  • More train­ing should oc­cur via video games like DragonBox, be­cause com­puter pro­grams can eas­ily provide in­stant re­in­force­ment many times a minute for very spe­cific be­hav­iors (Fletcher-Flinn & Gra­vatt 1995).

For ad­di­tional ex­am­ples and stud­ies, see The Power of Re­in­force­ment (2004), Don’t Shoot the Dog (2006), and Learn­ing and Be­hav­ior (2008).

    I close with Story 5, from Amy Suther­land:

    For a book I was writ­ing about a school for ex­otic an­i­mal train­ers, I started com­mut­ing from Maine to Cal­ifor­nia, where I spent my days watch­ing stu­dents do the seem­ingly im­pos­si­ble: teach­ing hye­nas to pirou­ette on com­mand, cougars to offer their paws for a nail clip­ping, and ba­boons to skate­board.

    I listened, rapt, as pro­fes­sional train­ers ex­plained how they taught dolphins to flip and elephants to paint. Even­tu­ally it hit me that the same tech­niques might work on that stub­born but lov­able species, the Amer­i­can hus­band.

    The cen­tral les­son I learned from ex­otic an­i­mal train­ers is that I should re­ward be­hav­ior I like and ig­nore be­hav­ior I don’t. After all, you don’t get a sea lion to bal­ance a ball on the end of its nose by nag­ging. The same goes for the Amer­i­can hus­band.

    Back in Maine, I be­gan thank­ing Scott if he threw one dirty shirt into the ham­per. If he threw in two, I’d kiss him. Mean­while, I would step over any soiled clothes on the floor with­out one sharp word, though I did some­times kick them un­der the bed. But as he basked in my ap­pre­ci­a­tion, the piles be­came smaller.

    I was us­ing what train­ers call “ap­prox­i­ma­tions,” re­ward­ing the small steps to­ward learn­ing a whole new be­hav­ior...

    Once I started think­ing this way, I couldn’t stop. At the school in Cal­ifor­nia, I’d be scrib­bling notes on how to walk an emu or have a wolf ac­cept you as a pack mem­ber, but I’d be think­ing, “I can’t wait to try this on Scott.”

    ...After two years of ex­otic an­i­mal train­ing, my mar­riage is far smoother, my hus­band much eas­ier to love.

    My thanks to Erica Edel­man for do­ing much of the re­search for this post.