In the case of MNIST, how good is the judge itself—for example, if you were to pick the six pixels optimally to give it the most information, how well would it perform?
I agree with Lanrian. A perhaps better metric is the chance that randomly selected pixels of a randomly selected image will cause the judge to guess the label correctly. This corresponds to “judge accuracy (random pixels)” in Table 2 of the original paper, and it’s 48.2%/59.4% for 4⁄6 pixels.
What do you mean with picking pixels optimally? For very close to all images, I expect there to exist six pixels such that the judge identifies the correct label, if they are revealed. That doesn’t seem like a meaningful metric, though.
In the case of MNIST, how good is the judge itself—for example, if you were to pick the six pixels optimally to give it the most information, how well would it perform?
I agree with Lanrian. A perhaps better metric is the chance that randomly selected pixels of a randomly selected image will cause the judge to guess the label correctly. This corresponds to “judge accuracy (random pixels)” in Table 2 of the original paper, and it’s 48.2%/59.4% for 4⁄6 pixels.
What do you mean with picking pixels optimally? For very close to all images, I expect there to exist six pixels such that the judge identifies the correct label, if they are revealed. That doesn’t seem like a meaningful metric, though.