Quintin Pope comments on ARC’s first technical report: Eliciting Latent Knowledge

Quintin Pope 15 Dec 2021 3:41 UTC
4 points
0
Ensuring interpretable models remain competitive is important. I’ve looked into the issue for dropout specifically. This paper disentangles the different regularization benefits dropout provides and shows we can recover dropout’s contributions by adding a regularization term to the loss and noise to the gradient updates (the paper derives expressions for both interventions).
I think there’s a lot of room for high performance, relatively interpretable deep models. E.g., the human brain is high performance and seems much more interpretable than you’d expect from deep learning interpretability research. Given our limitations in accessing/manipulating the brain’s internal state, something like brain stimulation reward seems like it should be basically impossible, if the brain were as uninterpretable as current deep nets.