Current NN matrices are dense and continuous weighted. A significant part of the difficulty of interpretability is that they have all to all connections; it is difficult to verify that one activation does or does not affect another activation.
However we can quantize the weights to 3 bit and then we can probably melt the whole thing into pure combinational logic. While I am not entirely confident that this form is strictly better from an interpretability perspective, it is differently difficult.
“Giant inscrutable matrices” are probably not the final form of current NNs, we can potentially turn them into different and nicer form.
While I am optimistic about simple algorithmic changes improving the interpretability situation (the difference between L1 and L2 regularization seems like a big example of hope here, for example), I think the difficulty floor is determined by the complexity of the underlying subject matter that needs to be encoded, and for LLMs / natural language that’s going to be very complex. (And if you use an architecture that can’t support things that are as complex as the underlying subject matter, the optimal model for that architecture will correspondingly have high loss.)
Current NN matrices are dense and continuous weighted. A significant part of the difficulty of interpretability is that they have all to all connections; it is difficult to verify that one activation does or does not affect another activation.
However we can quantize the weights to 3 bit and then we can probably melt the whole thing into pure combinational logic. While I am not entirely confident that this form is strictly better from an interpretability perspective, it is differently difficult.
“Giant inscrutable matrices” are probably not the final form of current NNs, we can potentially turn them into different and nicer form.
While I am optimistic about simple algorithmic changes improving the interpretability situation (the difference between L1 and L2 regularization seems like a big example of hope here, for example), I think the difficulty floor is determined by the complexity of the underlying subject matter that needs to be encoded, and for LLMs / natural language that’s going to be very complex. (And if you use an architecture that can’t support things that are as complex as the underlying subject matter, the optimal model for that architecture will correspondingly have high loss.)