PEER to Logic/Math functions: you can distill each PEER expert into a mix of logical statements and mathematical functions. To do this I’ve drawn on KAN 2.0 and Differentiable Logic Gates Why would I do this? Well, I’m hopeful it might be useful for interpretability, and also it seems potentially possible to make the distilled model run inference quite efficiently on CPU.
Why would this be useful for interpretability? You can already represent INT4 networks with logic gates. I’d also guess that DLG networks break many of our interp tools; e.g. it’s harder to do linear probes or look at the residual stream or activations.
Why would this be useful for interpretability? You can already represent INT4 networks with logic gates. I’d also guess that DLG networks break many of our interp tools; e.g. it’s harder to do linear probes or look at the residual stream or activations.
vague handwaving KAN papers made it seem cool.
I dunno, I’m very much in the ‘try stuff and see what works’ stage of this project. Open to suggestions!