We definitely didn’t answer all the prediction questions in this posts, and don’t have answers to all the prediction questions—I put in some so it wouldn’t be obvious what exactly we had found.
Re: 2. I’d off-the-cuff estimate 50% success rate for locally retargeting to top-left and about 14% to bottom-right, modifying ~11 activations (out of 32,768). If we use the cheese vector as well (modifying all of the activations at the layer), that might go up further. Haven’t run the stats, just my sense of how it would go down.
We definitely didn’t answer all the prediction questions in this posts, and don’t have answers to all the prediction questions—I put in some so it wouldn’t be obvious what exactly we had found.
Re: 2. I’d off-the-cuff estimate 50% success rate for locally retargeting to top-left and about 14% to bottom-right, modifying ~11 activations (out of 32,768). If we use the cheese vector as well (modifying all of the activations at the layer), that might go up further. Haven’t run the stats, just my sense of how it would go down.