I find experiments where you get <<50% val acc sketchy so I quickly ran my own using a very fake dataset made out of vectors in {-1,1}^d that I pass through 10 randomly initialized ReLU MLPs with skip connections. Here, the “features” I care about are canonical directions in the input space.
What I find:
XOR is not incidentally represented if the features are not redundant
XOR is incidentally represented if the features are very redundant (which is often the case in Transformers, but maybe not to the extent needed for XOR to be incidentally represented). I get redundancy by using input vectors that are a concatenation of many copies of a smaller input vector.
On xor being represented incidentally:
I find experiments where you get <<50% val acc sketchy so I quickly ran my own using a very fake dataset made out of vectors in {-1,1}^d that I pass through 10 randomly initialized ReLU MLPs with skip connections. Here, the “features” I care about are canonical directions in the input space.
What I find:
XOR is not incidentally represented if the features are not redundant
XOR is incidentally represented if the features are very redundant (which is often the case in Transformers, but maybe not to the extent needed for XOR to be incidentally represented). I get redundancy by using input vectors that are a concatenation of many copies of a smaller input vector.
See my code for more details: https://pastebin.com/LLjvaQLC
I think you might need to change permissions on your github repository?
Oops, here is the fixed link:
https://pastebin.com/LLjvaQLC