D_Malik comments on Reinforcement and Short-Term Rewards as Anti-Akratic

D_Malik 14 Apr 2013 2:53 UTC
12 points
0
This is awesome! I’m really excited because I’ve been playing around with related things for a while, and by sharing techniques we can all become stronger!

So: A couple months ago I had a massive problem with my Anki reviews. I use Anki for a lot of things that aren’t standard question-answer training, and this other work tends to be really mentally effortful. So, naturally, the reviews piled up until I had a massive backlog, around 10,000 reviews.

I tried a bunch of tricks; from this post (“Applying Behavioral Psychology on Myself”) and elsewhere I got ideas for two Anki plugins intended to reinforce reviewing, by manipulating music volume or by showing reinforcing pictures. This was somewhat effective, but not very. Then I moved on to using candy, which worked better but still wasn’t very effective.

But the latest thing I’ve been trying works a lot better. Look at my anki review stats:

The technique that did that was this:
- Reinforce reviews by giving a 20% chance of reward upon the completion of each.
- A reward is a small piece of food or a sip of a drink.
- Never eat or drink anything except when you’ve earned a reinforcer.
As you procrastinate, you become hungrier and hungrier until your desire for rewards exceeds your desire for non-work. By keeping rewards small, you remain perpetually hungry and work remains reinforcing. The brain was built to extrapolate from “I’m less hungry” to “I should do whatever I just did more often”.

This is especially good for something that needs to be done regularly, like anki reviews. If rewards and reward-probabilities are small enough, it also functions as caloric restriction. This system is also good for very granular tasks, like question-answer Anki reviews.

My non-question-answer reviews have irregular lengths and so aren’t as granular, so for them I use another reinforcement system in addition to the old one: every 10 seconds, with probability ~20%, show a message in the background. When that appears, I examine my thoughts just prior to it appearing and reward if they were about work or other productive things. This also seems to work well, and works better for tasks where you don’t want to incentivize rushing through things. (For question-answer pairs, success is measurable in correct responses, so generally you should rush through them, as long as you still get the correct responses.)

I also have a thing that periodically asks whether I’m in a correct posture, and standing instead of sitting, and not procrastinating sleep. If I’m in the right state, those give additional medium-sized rewards. I implemented this only 2 days ago, so I don’t know if it works yet.

I strongly encourage you to try variable reinforcement, because my impression from reading things is that it’s a lot better; I haven’t tried non-variable reinforcement.

A similar thing I’ve been experimenting with is punishing unwanted behaviors, mostly by using the rubber band technique; mixed results so far.

I’m very interested in your automatic dog-feeder setup. I’ve been thinking about further automating my reinforcement things, and about automating punishment by buying an electroshock collar (normally used to train dogs).

How to implement some of the stuff above:

See the two anki plugins I posted. I just put up a more basic version that shows popups instead of pictu res.

On Ubuntu Linux, to show a background popup every 10 seconds (sorta), with some probability, do crontab -e and add this:

* * * * * bash -c "export DISPLAY=:0 && for i in 1 2 3 4 5; do if [ \$(( \$RANDOM / 327 )) -lt 20 ] ; then notify-send 'REINFORCE?'; (sleep 3; killall notify-osd ) & fi; sleep 10; done"

That’s what I currently use; I used to use this, which shows foreground popups and should thus probably be kept commented-out most of the time:

* * * * * bash -c "export DISPLAY=:0 && for i in 1 2 3 4 5; do if [ \$(( \$RANDOM / 327 )) -lt 10 ] ; then zenity --info --text='\n\nREINFORCE?' --timeout=5 --width=1000 --height=800 & fi; sleep 10; done; if [ \$(( \$RANDOM / 327 )) -lt 10 ] ; then zenity --info --text='\n\nREINFORCE?' --timeout=5 --width=1000 --height=800 & fi"

If anyone is planning to use any of this, please tell me. Also please share any ideas you have, even if they don’t seem useful.
What links here?
- D_Malik's comment on Post ridiculous munchkin ideas! by D_Malik (10 May 2013 12:01 UTC; 36 points)
- Richard_Kennaway 15 Apr 2013 13:23 UTC
  4 points
  0
  Parent
  
  I’ve been thinking about further automating my reinforcement things, and about automating punishment by buying an electroshock collar (normally used to train dogs).
  
  I don’t know if you’re being serious here, but I am.
  
  Beware of shock collars. Dogs have much thicker skin than humans, it’s more loosely attached to the underlying tissue, and it’s covered with fur. A mild zap for a dog might be too dangerous to apply to a human neck. I suggest contacting your local BDSM community for advice on where and how to safely give yourself electric shocks. They may also be able to advise on ways of making it impossible to take off until the time is up. Although, as I’ve said in a top-level comment to this post, whatever the setup, the conditionality between behaviour and reward is an exercise in role-playing. In reality you can eat and drink whatever you want whenever you want, and you are choosing to imagine a connection with doing your Anki work.
  What links here?
  - Antisuji's comment on How habits work and how you may control them by Kaj_Sotala (13 Oct 2013 17:02 UTC; 10 points)
  - gwern 15 Apr 2013 17:28 UTC
    7 points
    0
    Parent
    More importantly, with conditioning, there’s always the question of what exactly are you doing? (Particularly acute in cases of positive punishment like electroshock.) As far as the suggestion goes, well… I cannot seem to refind it, but within the last 2 or 3 years I ran into a blog where the author set up a electric shock apparatus for himself hooked up to some software. His final post said (or he said when I asked, I forget) that he stopped because he wound up training himself to not use it and it was too aversive to put on.
- David_Gerard 15 Apr 2013 11:24 UTC
  3 points
  0
  Parent
  
  I’m very interested in your automatic dog-feeder setup. I’ve been thinking about further automating my reinforcement things, and about automating punishment by buying an electroshock collar (normally used to train dogs).
  
  At this point I thought “ok, that was an extended Modest Proposal”.
- Intrism 15 Apr 2013 14:14 UTC
  1 point
  0
  Parent
  
  every 10 seconds, with probability ~20%, show a message in the background. When that appears, I examine my thoughts just prior to it appearing and reward if they were about work or other productive things.
  
  Have you had any problems with the context switching? It seems like being interrupted every ~50 seconds would make me less productive.