I agree with the criticisms of literal GOFAI here, but I can imagine a kind of pseudo-GOFAI agenda plausibly working here. Classical logic is probably hopeless for this for the reasons you outline (real-world fuzziness), but it still seems an open question whether there’s some mathematical formalism with which you can reason about the input-output mapping.
I would gesture at dynamical systems analysis in RNNs, and circuit-based interpretability as the kinds of things that would enable this. For example, perhaps a model has learned to perform addition using a bag of heuristics, and you notice that there’s a better set of heuristics that it didn’t learn for path-dependent training reasons (e.g. clock and pizza). This would then enable the same kind of labor-intensive improvement through explicit reasoning about representations rather than end-to-end training.
It’s not clear to me that this will work, but the challenge is to explicitly articulate which properties of the function from inputs to outputs render it impossible. I don’t think fuzziness alone does it, like in the case of classical logic, because the mathematical structures involved might be compatible with fuzziness. Maybe the mechanisms in your model aren’t “local enough”, in that they play a role across too much of your input distribution to edit without catastrophic knock-on effects. Maybe the mechanisms are intrinsically high dimensional in a way that makes them hard to reason about as mechanisms. And of course, maybe it’s just never more efficient than end-to-end training.
I agree with the criticisms of literal GOFAI here, but I can imagine a kind of pseudo-GOFAI agenda plausibly working here. Classical logic is probably hopeless for this for the reasons you outline (real-world fuzziness), but it still seems an open question whether there’s some mathematical formalism with which you can reason about the input-output mapping.
I would gesture at dynamical systems analysis in RNNs, and circuit-based interpretability as the kinds of things that would enable this. For example, perhaps a model has learned to perform addition using a bag of heuristics, and you notice that there’s a better set of heuristics that it didn’t learn for path-dependent training reasons (e.g. clock and pizza). This would then enable the same kind of labor-intensive improvement through explicit reasoning about representations rather than end-to-end training.
It’s not clear to me that this will work, but the challenge is to explicitly articulate which properties of the function from inputs to outputs render it impossible. I don’t think fuzziness alone does it, like in the case of classical logic, because the mathematical structures involved might be compatible with fuzziness. Maybe the mechanisms in your model aren’t “local enough”, in that they play a role across too much of your input distribution to edit without catastrophic knock-on effects. Maybe the mechanisms are intrinsically high dimensional in a way that makes them hard to reason about as mechanisms. And of course, maybe it’s just never more efficient than end-to-end training.