I’ve been thinking about interpretable models. If we have some system making decisions for us, it seems good if we can ask it “Why did you suggest action X?” and get back something intelligible.
So I read up about what sorts of things other people have come up with. Something that seemed cool was this idea of tree regularization. The idea being that decision trees are sort of the standard for interpretable models because they typically make splits along features. You essentially train a regularizer (which is a neural net) which proxies average tree length (i.e. the complexity of a decision tree which is comparable to the actual model you’re training). Then, when you’re done, you can train a new decision tree which mimics the final neural net (the one you trained with the regularizer).
The author pointed out that, in the process of doing so, you can see what features the model thinks are relevant. Sometimes they don’t make sense, but the whole point is that you can at least tell that they don’t make sense (from a human perspective) because the model is less opaque. You know more than just “well, it’s a linear combination of the inputs, followed by some nonlinear transformations, repeated a bunch of times”.
But if the features don’t seem to make sense, I’d still like to know why they were selected. If the system tells us “I suggested decision X because of factors A, B, and C” and C seems really surprising to us, I’d like to know what value it’s providing to the prediction.
I’m not sure what sort of justification we could expect from the model, though. Something like “Well, there was this regularity that I observed in all of the data you gave me, concerning factor C,” seems like what’s happening behind the scenes. Maybe that’s a sign for us to investigate more in the world, and the responsibility shouldn’t be on the system. But, still, food for thought.
I’ve been thinking about interpretable models. If we have some system making decisions for us, it seems good if we can ask it “Why did you suggest action X?” and get back something intelligible.
So I read up about what sorts of things other people have come up with. Something that seemed cool was this idea of tree regularization. The idea being that decision trees are sort of the standard for interpretable models because they typically make splits along features. You essentially train a regularizer (which is a neural net) which proxies average tree length (i.e. the complexity of a decision tree which is comparable to the actual model you’re training). Then, when you’re done, you can train a new decision tree which mimics the final neural net (the one you trained with the regularizer).
The author pointed out that, in the process of doing so, you can see what features the model thinks are relevant. Sometimes they don’t make sense, but the whole point is that you can at least tell that they don’t make sense (from a human perspective) because the model is less opaque. You know more than just “well, it’s a linear combination of the inputs, followed by some nonlinear transformations, repeated a bunch of times”.
But if the features don’t seem to make sense, I’d still like to know why they were selected. If the system tells us “I suggested decision X because of factors A, B, and C” and C seems really surprising to us, I’d like to know what value it’s providing to the prediction.
I’m not sure what sort of justification we could expect from the model, though. Something like “Well, there was this regularity that I observed in all of the data you gave me, concerning factor C,” seems like what’s happening behind the scenes. Maybe that’s a sign for us to investigate more in the world, and the responsibility shouldn’t be on the system. But, still, food for thought.