Something like “We have mapped out the possible human-understandable or algorithmically neat descriptions of the network’s behavior sufficiently comprehensively and sampled from this space sufficiently comprehensively to know that the probability that there’s a description of its behavior that is meaningfully shorter than the shortest one of the ones that we’ve found is at most ϵ.”.
I’m not convinced that you can satisfy either of those “sufficiently comprehensively” such that you’d be comfortable arguing your model is not somehow interpretable
Something like “We have mapped out the possible human-understandable or algorithmically neat descriptions of the network’s behavior sufficiently comprehensively and sampled from this space sufficiently comprehensively to know that the probability that there’s a description of its behavior that is meaningfully shorter than the shortest one of the ones that we’ve found is at most ϵ.”.
Yeah, seems hard
I’m not convinced that you can satisfy either of those “sufficiently comprehensively” such that you’d be comfortable arguing your model is not somehow interpretable
I’m not claiming it’s feasible (within decades). That’s just what a solution might look like.