Would a tooling paper be appropriate for this workshop?
I wrote a tool that helps ML researchers to analyze the internals of a neural network: https://github.com/FlorianDietz/comgra
It is not directly research on mechanistic interpretability, but this could be useful for many people working in the field.
It just seems intuitively like a natural fit: Everyone in mech interp needs to inspect models. This tool makes it easier to inspect models.
Does it need to be more specific than that?
One thing that comes to mind: The tool allows you to categorize different training steps and records them separately, and you can define categories arbitrarily. This can be used to compare what the network does internally in two different scenarios of interest. E.g. the categories could be “the race of the character in the story” or some other real-life condition you would want to know the impact of.
The tool will then allow you to quickly compare KPIs of tensors all across the network for these categories. It’s less about testing a specific hypothesis and more about quickly getting an overview and intuition, and finding anomalies.