Rain comments on The Power of Reinforcement

Rain 21 Jun 2012 3:13 UTC
18 points
That’s why I tried to stay positive when talking about the new SI website. Especially with technical changes like that, the (vocal) negative response can be overwhelming.
- lukeprog 21 Jun 2012 3:37 UTC
  32 points
  Parent
  Yup. When reading through the comments about the new website, I could feel my effort being punished.
  What links here?
  - dbaupp's comment on The Power of Reinforcement by lukeprog (21 Jun 2012 5:49 UTC; 4 points)
  - pjeby 21 Jun 2012 16:52 UTC
    20 points
    Parent
    
    Yup. When reading through the comments about the new website, I could feel my effort being punished.
    
    Perhaps you could have somebody read them for you and summarize them in a non-critical way, thus creating a reinforcement shield.
    
    Alternately, you could adapt what internet marketing “personalities” do, and promote doing: practice celebrating criticism. One marketer (I forget which one) described making a practice of throwing his hands in the air and shouting “Woo!” when he received a criticism via email.
    
    (Background: “personality” marketers promote by writing emotionally charged material that’s intended to divide their audience into people who either love or hate them. Thus, the presence of hate mail is evidence that their strategy is working. They will then often publicize the hate mail, in order to stir up the emotions of the people on the opposite side of the debate. Talk radio hosts, bloggers, political commentators, etc. also use these strategies, even if they’re not always considered “marketers” in a traditional sense. Whether you consider this “dark arts” is largely a political question, since the LW sequences use these tactics also. Whether he knows it or not, Eliezer is a personality marketer in this sense, it’s just that he’s not as efficiently monetizing the results. ;-) )
  - John_Maxwell 21 Jun 2012 5:45 UTC
    7 points
    Parent
    Sorry about that.
    
    It seems to me that if humans were emotionless utility maximizers, we would prefer hearing criticism over praise, the same way programmers purchase more utility by fixing bugs in their programs than polishing features that already work. I suspect criticism is generally more valuable from a pure decision theoretic perspective.
    
    I wonder if there is an effective way to buy encouragement and criticism separately. Also, it’s hard to know exactly how best to encourage folks. In theory it’s possible that making a new website is not the best use of SI’s resources, which suggests reinforcement would not be optimal. But we still may want to reinforce you towards the more general behavior of taking steps to achieve your organizational goals. So what’s the best response?
    
    Maybe someone can develop some general guidelines for reinforcing/criticizing people, similar to what the nonviolent communication people came up with. (When {observable event} happened, I felt {feeling} because I need/value {underlying need that felt unmet or value that felt jeopardized}. Would you be willing to {specific request that person could do} in the future?) E.g. check to see if the person was acting with good intentions and reinforce them for those if they existed, check for super goals you endorse and reinforce them for working to accomplish those, check to see if the person could just have easily have sat around doing nothing and reinforce them for expending effort if this was the case, etc.
    
    I think optimally criticism would have lots more reinforcers associated with it: people should be reinforced for requesting, giving, and receiving criticism because these are all activities that are naturally aversive but actually have high expected value.
    
    So, I wholeheartedly endorse the following actions of yours: attempting to maximize humanity’s collective utility function, working on the super goal of AGI safety, actually doing stuff, and deliberately gathering critical feedback. Go Luke!
    - handoflixue 22 Jun 2012 19:27 UTC
      0 points
      Parent
      Criticism > Praise > Nothing. The problem is, people default to “Criticize or stay quiet”, and so I tend to value praise highly, as it’s culturally much rarer.
      
      Also, if it’s a matter of opinion (rather than an actual code bug), praise can actively offset criticism (1 person dislikes it, but the other 99 users all love the new UI… probably not wise to revert!)
      - TimS 22 Jun 2012 19:32 UTC
        0 points
        Parent
        Edit: This statement is basically wrong. I was confusing negative instruction with punishment. Comment preserved for continuity.
        
        Interestingly, adding a stimuli that decreases the frequency of a behavior (aka positive punishment) is less effective at changing behavior frequency than positive reinforcement.
        
        That is, reinforcing an alternate behavior is more effective at decreasing a problem behavior than simply punishing the problem behavior. (I think this is even true when there are only two possible behaviors).
        TheOtherDave 22 Jun 2012 19:41 UTC
        0 points
        Parent
        Really? I’d love a reference for this. My understanding was always that positive punishment has a stronger effect on behavior frequency than (for example) training an incompatible behavior, but also has lots of other effects that I don’t want to instill, which are often more important than maximizing effect on behavior frequency (e.g., reducing the rate at which novel behaviors are offered).
        TimS 22 Jun 2012 19:54 UTC
        0 points
        Parent
        Let me clarify slightly, because I wasn’t trying to say something earth-shaking. If I did say something earth-shaking, I’m probably wrong.
        
        My statement was made assuming that Bob already has a problem behavior that we would like to decrease the frequency of, and eventually extinguish. To be more concrete, let’s say Bob has bathroom accidents (he voids away from the toilet). All I meant to say was the statement “Good job going pee-pee on the potty” is more effective at reducing the frequency of accidents than “Bob, you shouldn’t go pee-pee in your underwear.”
        
        Yes, I’m toilet training my son—why do you ask? :)
        TheOtherDave 22 Jun 2012 21:03 UTC
        1 point
        Parent
        Well… hrm. That might very well be true about toilet-training, as increased anxiety is one of the side-effects of positive punishment, and anxiety interacts exceptionally poorly with bladder control. So, I dunno.
        
        But in general, I’m pretty sure what you’re saying isn’t quite right. If I want to extinguish, say, jumping on the couch, consistently punishing incidents of jumping on the couch will extinguish the behavior much faster than pretty much anything else I can do.
        
        Please don’t misunderstand me; I absolutely don’t endorse this as a training technique. But the reason I reject it isn’t because it doesn’t extinguish the behavior quickly… it does. The reason I reject it is because it creates a host of related side-effects that make subsequent training much more difficult, not to mention make the subsequent relationship with the trainer (and often with everyone else) much more unpleasant for the trainee.
        
        Punishment is a blunt axe, but it’s a powerful blunt axe.
        TimS 22 Jun 2012 23:47 UTC
        3 points
        Parent
        I talked with my wife, the future BCBA, and it appears that my intellectual reach has exceeded my grasp. First, I seem to have confused positive reinforcement v. punishment and positive and negative instruction. It is the case that negative instruction (“Don’t throw your toy car”) is less effective than positive instruction (“We only throw balls”).
        
        Second, there are some interventions, reinforcing and punishing, that could teach in one trial (consider heroin injections as reinforcement and flamethrowers as punishment). Edit: my wife says this point is about salience.
        
        Third, best practices among behavior analysts are to use reinforcement prior to using punishment. My wife says that this is for ethical reasons—her reference book didn’t talk about the relative effectiveness of reinforcement and punishment.
        TheOtherDave 23 Jun 2012 1:26 UTC
        1 point
        Parent
        
        I seem to have confused positive reinforcement v. punishment and positive and negative instruction.
        
        Ah! Yes, that makes sense. Negative instruction doesn’t work very well, it’s true.
        
        there are some interventions, reinforcing and punishing, that could teach in one trial
        
        Mm… yeah, that’s a good point. I was eliding the distinction between salience and reward/punishment, and ought not have.
  - wedrifid 21 Jun 2012 5:16 UTC
    2 points
    Parent
    
    Yup. When reading through the comments about the new website, I could feel my effort being punished.
    
    I am slightly surprised to hear this. I perhaps expected slightly less emotional involvement with the effort and more of a ”, Go! Fix!” feeling.
    - lukeprog 21 Jun 2012 5:20 UTC
      3 points
      Parent
      What happened is that (1) I felt my effort being punished, and then (2) I sent an email to Nickolai or Kamil asking them to fix X.
      - wedrifid 21 Jun 2012 5:26 UTC
        14 points
        Parent
        
        I sent an email to Nickolai or Kamil asking them to fix X.
        
        Great work Nickolai or Kamil, if either of you read lesswrong at all. The website is a much needed improvement! ;)
      - wedrifid 21 Jun 2012 5:48 UTC
        7 points
        Parent
        
        (2) I sent an email to Nickolai or Kamil asking them to fix X.
        
        I’ve noticed (while being such a minion) that when making such change requests yourself manage to do so with a frame that minimises a criticism vibe or ‘effort punishment’ feelings. I would pay many, many M&Ms for that effort in careful phrasing.
      - NancyLebovitz 22 Jun 2012 3:16 UTC
        0 points
        Parent
        Thank you for toughing it out.
        
        I’m sorry if my comments were too harsh.
  - Vladimir_Nesov 21 Jun 2012 18:56 UTC
    0 points
    Parent
    I don’t have that response, which probably accounts for me not being sufficiently mindful of expressing criticism to others… Do you think there may be a way to train positive or neutral response to criticism? Are there effective methods for making criticism less painful to a typical person?
- dbaupp 21 Jun 2012 5:49 UTC
  4 points
  Parent
  I tried to do the same. Although, I was probably significantly less successful than I’d liked to have been (sorry Luke, Nickolai, Kamil and anyone else who’d made an effort!).
  
  Also, given lukeprog’s comment, this unfortunately appears to be a case of history repeating itself: matt had a similarly negative experience when LW was redesigned a little while ago.
  - wedrifid 21 Jun 2012 6:28 UTC
    9 points
    Parent
    
    matt had a similarly negative experience when LW was redesigned a little while ago.
    
    That circumstance is somewhat different in nature. While as far as I know nobody wanted matt to experience negative affect the discouragement of ‘effort’ was actually a perceived instrumental good, given an expectation that more effort would produce undesired outcomes.
    
    I note that this relies on beliefs at the time. In that context users had to make the prediction “If a website administrator implements detrimental changes when previous discussion had already explained why such a thing was not desired and a prediction had been made that a change to the website would probably be bad, what is the probability that future ‘effort’ will be beneficial?” The answer is very, very low. The emotional distress matt experience was his social instincts warning him that interfering with the tribe when they do not want you to is a dangerous act—especially when that interference is to (in effect) institute a prohibition against something they could previously do.
    
    It turns out, however, that matt is a superior human being to the typical person in his role. While his ego did cause him to act more defensive than optimal and seemed to cause him to experience emotional distress it did not cripple his ability to respond to user feedback or cause him to lash out with actions against the users as many would. The undesired change was eventually fixed, as were the few bugs that were introduced.
    
    I expect users to drastically update how they would respond to matt if he made future website upgrades due to having more information about matt. He definitely deserves a lot of rewarding for going ahead and doing the bugfixes and implementing ‘retraction then deletion’ despite having received discouragement. He lives (here) in Melbourne. Perhaps I should give him a packet of M&Ms if I run in to him at one of our meetups!