Lucius Bushnaq comments on Coordinate-Free Interpretability Theory

• Curious how looking at properties of the functions the embed through their activation patterns fits into this picture.

For example, take the L2 norms of the activations of all entries of , averaged over some set of network inputs. The sum and product of those norms will both be coordinate independent.

In fact, we can go one step further, and form , the matrix of the L2 inner products of all the layer base elements with each other. The eigendecomposition of this matrix is also coordinate independent, up to degeneracy in the eigenvalues.

(This eigenbasis also sure looks like a uniquely determined basis to me)

You can think of these quantities as measures of the number of “unique” activation patterns and their “size” that exist in the layer.

In your framing, does this correspond to adding in topological information from all the previous layers, through the mapping ?

• For example, take the L2 norms of the activations of all entries of , averaged over some set of network inputs. The sum and product of those norms will both be coordinate independent.

That would be true if the only coordinate changes we consider are rotations. But the post is talking about much more general transformations than that—we’re allowing not only general linear transformations (i.e. stretching in addition to rotations), but also nonlinear transformations (which is why RELUs don’t give a preferred coordinate system).

• Ah, right, you did mention polar coordinates.

Hm, stretching seems handleable. How about also using the weight matrix, for example? Change into the eigenbasis above, then apply stretching to make all L2 norms size 1 or size 0. Then look at the weights, as stretching-and-rotation invariant quantifiers of connectedness?

Maybe doesn’t make much sense when considering non-linear transformations though.

• I think that’s the same as finding a low-rank decomposition, assuming I correctly understand what you’re saying?

• Sai, who is a lot more topology-savy than me, now suspects that there is indeed a connection between this norm approach and the topology of the intermediate set. We’ll look into this.