The MNIST CNN was trained only on the 50k training examples.
I did not guarantee that the models had perfect train accuracy. I don’t believe they did.
I think that any interpretability tools are allowed. Saliency maps are fine. But to ‘win,’ a submission needs to come with a mechanistic explanation and sufficient evidence for it. It is possible to beat this challenge by using non mechanistic techniques to figure out the labeling function and then using that knowledge to find mechanisms by which the networks classify the data.
At the end of the day, I (and possibly Neel) will have the final say in things.
The MNIST CNN was trained only on the 50k training examples.
I did not guarantee that the models had perfect train accuracy. I don’t believe they did.
I think that any interpretability tools are allowed. Saliency maps are fine. But to ‘win,’ a submission needs to come with a mechanistic explanation and sufficient evidence for it. It is possible to beat this challenge by using non mechanistic techniques to figure out the labeling function and then using that knowledge to find mechanisms by which the networks classify the data.
At the end of the day, I (and possibly Neel) will have the final say in things.
Thanks :)