johnswentworth comments on And All the Shoggoths Merely Players

johnswentworth 27 Feb 2024 4:59 UTC
12 points
1
Third, the nontrivial prediction of 20 here is about “compactly describable errors. “Mislabelling a large part of the time (but not most of the time)” is certainly a compactly describable error. You would then expect that as the probability of mistakes increased, you’d have a meaningful boost in generalization error, but that doesn’t happen. Easy Bayes update against #20. (And if we can’t agree on this, I don’t see what we can agree on.)
I indeed disagree with that, and I see two levels of mistake here. At the object level, there’s a mistake of not thinking through the gears. At the epistemic level, it looks like you’re trying to apply the “what would I have expected in advance?” technique of de-biasing, in a way which does not actually work well in practice. (The latter mistake I think is very common among rationalists.)
First, object-level: let’s walk through the gears of a mental model here. Model: train a model to predict labels for images, and it will learn a distribution of labels for each image (at least that’s how we usually train them). If we relabel 1′s as 7′s 20% of the time, then the obvious guess is that the model will assign about 20% probability (plus its “real underlying uncertainty”, which we’d expect to be small for large fully-trained models) to the label 7 when the digit is in fact a 1.
What does that predict about accuracy? That depends on whether the label we interpret our model as predicting is top-1, or sampled from the predictive distribution. If the former (as is usually used, and IIUC is used in the paper) then this concrete model would predict basically the curves we see in the paper: as noise ramps up, accuracy moves relatively little (especially for large fully-trained models), until the incorrect digit is approximately as probable as the correct digit, as which point accuracy plummets to ~50%. And once the incorrect digit is unambiguously more probable than the incorrect digit, accuracy drops to near-0.
The point: when we think through the gears of the experimental setup, the obvious guess is that the curves are mostly a result of top-1 prediction (as opposed to e.g. sampling from the predictive distribution), in a way which pretty strongly indicates that accuracy would plummet to near-zero as the correct digit ceases to be the most probable digit. And thinking through the gears of Yudkowsky’s #20, the obvious update is that predictable human-labeller-errors which are not the most probable labels are not super relevant (insofar as we use top-1 sampling, i.e. near-zero temperature) whereas human-labeller-errors which are most probable are a problem in basically the way Yudkowsky is saying. (… insofar as we should update at all from this experiment, which we shouldn’t very much.)
Second, epistemic-level: my best guess is that you’re ignoring these gears because they’re not things whose relevance you would have anticipated in advance, and therefore focusing on them in hindsight risks bias^[1]. Which, yes, it does risk bias.
Unfortunately, the first rule of experiments is You Are Not Measuring What You Think You Are Measuring. Which means that, in practice, the large majority of experiments which nominally attempt to test some model/theory in a not-already-thoroughly-understood-domain end up getting results which are mostly determined by things unrelated to the model/theory. And, again in practice, few-if-any people have the skill of realizing in advance which things will be relevant to the outcome of any given experiment. “Which things are we actually measuring?” is itself usually figured out (if it’s figured out at all) by looking at data from the experiment.
Now, this is still compatible with using the “what would I have expected in advance?” technique. But it requires that ~all the time, the thing I expect in advance from any given experiment is “this experiment will mostly measure some random-ass thing which has little to do with the model/theory I’m interested in, and I’ll have to dig through the details of the experiment and results to figure out what it measured”. If one tries to apply the “what would I have expected in advance?” technique, in a not-thoroughly-understood domain, without an overwhelming prior that the experimental outcome is mostly determined by things other than the model/theory of interest, then mostly one ends up updating in basically-random directions and becoming very confused.
1. ^
  Standard disclaimer about guessing what’s going on inside other peoples’ heads being hard, you have more data than I on what’s in your head, etc.
- TurnTrout 4 Mar 2024 18:37 UTC
  6 points
  0
  Parent
  The point: when we think through the gears of the experimental setup, the obvious guess is that the curves are mostly a result of top-1 prediction (as opposed to e.g. sampling from the predictive distribution), in a way which pretty strongly indicates that accuracy would plummet to near-zero as the correct digit ceases to be the most probable digit.
  I think this is a reasonable prediction, but ends up being incorrect:
  It decreases far faster than it should; on the top-1 theory, it should be ~flatlined for this whole graph (since for all $α > 0$ the strict majority of labels are still correct). Certainly top-5 should not be decreasing.
  - ryan_greenblatt 5 Mar 2024 19:15 UTC
    4 points
    2
    Parent
    This is in the data constrained case right?
    
    Maybe noise makes training worse because the model can’t learn to just ignore it due to insufficient data? (E.g., making training more noisy means convergence/compute efficiency is lower.)
    
    Also, does this decrease the size of the dataset by a factor of 5 in the uniform noise case? (Or did they normalize this by using a fixed set of labeled data and then just added additional noise labels?)