Linda Linsefors comments on Redwood Research’s current project

Linda Linsefors 24 Apr 2022 12:55 UTC
LW: 5 AF: 2
0
AF
There’s one thing you can do that definitely works, which is to only get labels for snippets which are just barely considered safe enough by your classifier. Eg if your threshold is set to 99%, so that a completion won’t be accepted unless the classifier is 99% sure that it’s safe, then there’s no point looking at completions rated as <99% likely to be safe (because the classifier isn’t going to accept them), and also it’s probably a better bet to look at things that the model thinks are 99.1% likely to be safe rather than 99.9%, because (assuming the model is calibrated) you’ll find errors 9x as often.
This seems wrong to me. You should want to label and train on snippets that your classifier thinks is 50% correct, because that is how you maximmise information.

I don’t know how to argue this point since I don’t know what the crux behinde the disagreement is, but I’ll try to through out some words...
If safeness was a continous number and you want solutions that are safe enough, it would be more reasonable to focus most traning around the cuttoff point. Although a wider traning data probably leads to better generalisations, so I would include that too.
But safety is not a continious number. It’s a binary in your setup. It is either somone is hurt or not. When you run it you want to have some extra safety by raising the threshold. But when you train you just want to reduce ucertanty. Things that the classifier thinks is 99% safe are not inharently 99% safe. They are either safe or not. So focusing your training around the thresshold don’t make any sense.

Another way to say this is that the uncertanty is in the model, not in the world. There are going to be snippets that the model is less than 99% sure about, but are acctually perfectly safe, and could be valuable training data.
- gwern 24 Apr 2022 16:58 UTC
  LW: 7 AF: 3
  0
  AF Parent
  
  You should want to label and train on snippets that your classifier thinks is 50% correct, because that is how you maximmise information.
  
  You don’t want to ‘maximize information’ (or minimize variance). You want to minimize the number of errors you make at your decision-threshold. Your threshold is not at 50%, it’s at 99%. Moving an evil sample from 50% to 0% is of zero intrinsic value (because you have changed the decision from ‘Reject’ to ‘Reject’ and avoided 0 errors). Moving an evil sample from 99.1% to 98.9% is very valuable (because you have changed the decision from ‘Accept’ to ‘Reject’ and avoided 1 error). Reducing the error on regions of data vastly far away from the decision threshold, such as deciding whether a description of a knifing is ever so slightly more ‘violent’ than a description of a shooting and should be 50.1% while the shooting is actually 49.9%, is an undesirable use of labeling time.
  - Linda Linsefors 25 Apr 2022 9:36 UTC
    LW: 8 AF: 4
    0
    AF Parent
    The correct labeling of how violent a knifing is, is not 50.1%, or 49.9%. The correct label is 0 or 100%. There is no “ever so slightly” in the training data. The percentage is about the uncertanty of classifyer, it is not about degrees of violence in the sample. It it was the other way around, then I would mostsy agree with the current training scheem, as I said.
    
    If the model is well calibrated then half the samples would be safe, and half violent at 50%. Moving a up the safe one is helpfull. Decreesing missclassification of safe samples will increas the chance of outputing something safe.
    
    Decreesing the uncertanty from 50% to 0 for an unsafe sample don’t do anything, for that sample. But it does help in learning good from bad in general, which is more important.
    - axioman 23 Jun 2022 18:10 UTC
      1 point
      0
      Parent
      ~~I think the actual solution is somewhere in between:~~ If we assume calibrated uncertainty, ignore generalization and assume we can perfectly fit the training data, the total cost should be reduced by (1-the probability assigned to the predicted class) * the cost of misclassifying the not predicted (minority) class as the predicted one (majority): If our classifier already predicted the right class, nothing happens, but otherwise we change our prediction to the other class and reduce the total cost.
      
      While this does not depend on the decision threshold, it does depend on the costs we assign to different misclassifications (in the special case of equal costs, the maximal probability that can be reached by the minority/non-predicted class is 0.5).
      Edit: This was wrong, the decision threshold is still implicit at 50% in the first paragraph (as cued by the words “majority” and “minority”) : If you apply a 99% decision threshold on a calibrated model, the highest probability you can get for “input is actually unsafe” if your threshold model predicts “safe” is 1%; (now) obviously, you do only get to move examples from predicted “unsafe” to predicted “safe” if you sample close to the 50% threshold, which does not give you much if falsely labelling things as unsafe is not very costly compared to falsely labelling things as safe.
      If we however assume that retraining will only shift the prediction probability by epsilon rather than fully flipping the label, we want to minimize the cost from above, subject to only targeting predictions that are epsilon-close to the threshold (as otherwise there won’t be any label flip). In the limit of epsilon->0, we thus should target the prediction threshold rather than 50% (independent of the cost).
      
      In reality, the extent to which predictions will get affected by retraining is certainly more complicated than suggested by these toy models (and we are still only greedily optimizing and completely ignoring generalization). But it might still be useful to think about which of these assumptions seems more realistic.