The post shows 32 words being tested. The 16 positive valance words all have higher weight with normal tokenization, the 16 low valance words have higher with unusual tokenization. This is extremely improbable by raw chance.
There’s no reason whatsoever to think that they are independent of each other. The very fact that you can classify them systematically as ‘positive’ or ‘negative’ valence indicates they are not and you don’t know what ‘raw chance’ here yields. It might be quite probable.
The post shows 32 words being tested. The 16 positive valance words all have higher weight with normal tokenization, the 16 low valance words have higher with unusual tokenization. This is extremely improbable by raw chance.
There’s no reason whatsoever to think that they are independent of each other. The very fact that you can classify them systematically as ‘positive’ or ‘negative’ valence indicates they are not and you don’t know what ‘raw chance’ here yields. It might be quite probable.
Right, I think that’s the hypothesis I was asking whether you had when I said
If so, yeah, this is compatible. I’d still put notably higher odds on the original thing I suggested, but this is the other main hypothesis.