The thing that remains confusing here is that for arbitrary features like these, it’s not obvious why the model is computing any nontrivial boolean function of them and storing it along a different direction. And if the answer is “the model computes this boolean function of arbitrary features” then the downstream consequences are the same, I think.
The thing that remains confusing here is that for arbitrary features like these, it’s not obvious why the model is computing any nontrivial boolean function of them and storing it along a different direction. And if the answer is “the model computes this boolean function of arbitrary features” then the downstream consequences are the same, I think.