Mm, I fear this argument is self-contradictory to a significant extent.
Interpretability is premised on the idea that it’s possible to reduce a “connectionist” system to a more abstract, formalized representation.
Consider the successful interpretation of a curve detector. Once we know what function a bunch of neurons implements, we can tear out these neurons, implement that function in a high-level programming language, then splice that high-level implementation into the NN in place of the initial bunch-of-neurons. If the interpretation is correct, the NN’s behavior won’t change.
Scaling this trick up, the “full” interpretation of a NN should allow us to re-implement the entire network in high-level-programming manner; I think Neel Nanda even did something similar here.
Redwood Research’s outline here agrees with this view. An “interpretation” of a NN is basically a transform of the initial weights-and-biases computational graph into a second, “simpler” higher-level computational graph.
So inasmuch as interpretability is possible, it implies the ability to transform a connectionist system into what’s basically GOFAI. And if we grant that, it’s entirely coherent to wish that we’ve followed the tech-development path where we’re directly figuring out how to build advanced AI in a high-level manner. Instead, we’re going about it in a round-about fashion: we’re generating incomprehensible black-boxes whose internals we’re then trying to translate into high-level representations.
The place where “connectionism” does outperform higher-level approaches is “blind development”. If we don’t know how the algorithm we want is supposed to work, only know what it should do, then such approaches may indeed be absolutely necessary. And in that view, it should be clear why evolution never stumbled on anything else: it has no idea what it’s doing, so of course it’d favour algorithms that work even if you have no idea what you’re doing.
(Though I don’t think brains’ uniformity is entirely downstream even of that. Computers can also be viewed as “a massive number of simple, uniform units connected to each other”, the units being transistors. Which suggests that it’s more of a requirement for general-purpose computational systems, imposed by the constraints of our reductionist universe. Not a constraint on the architecture of specifically intelligent systems.
Perhaps all good computational substrates have this architecture; but the software that’s implemented on these substrates doesn’t have to. And in the case of AI, the need for connectionism is already fulfilled by transistors, so AI should in principle be implementable via high-level programming that’s understandable by humans. There’s no absolute need for a second layer of connectionism in the form of NNs.)
I agree with your second point though, that complexity is largely the feature of problem domains, not agents navigating them. Agents’ policy functions are likely very simple, compared to agents’ world-models.
Generally speaking, that some already-learned ML algorithm can be transformed from from some form into alternate forms does not imply that it could easily—or ever at all, in the lifetime of the universe—be discovered in that second form.
For instance, it looks like any RELU-based neural network can be transformed into a decision tree, albeit potentially an extremely large one, while preserving exact functional equivalence. Nevertheless, for a many (though not all) substantial learning tasks, it seems likely you will wait until the continents collide and the sun cools before you are able to find that algorithm with decision-tree specific algorithms.
So, let’s for now assume that a completed interpretability research program allows you to transform a NN into a much more interpretable system with few removing parts, basically GOFAI—as you say. This in fact implies basically nothing about whether you could find such them without neural networks (let alone whether the idea that “This just-find-GOFAI-research-program is doomed” would be self-contradictory). You say it’s coherent to wish for this tech tree, and it is coherent to so wish—but do you think this is a promising research program?
The place where “connectionism” does outperform higher-level approaches is “blind development”.
But—isn’t intelligence basically the ability to enter a domain blind and adapt to it? For men to be on the moon, octopuses to prank aquarium staff, ravens to to solve weird-puzzles. Like I’m not sure what you’re getting at here, you seem to be saying that connectionism is inferior to other solutions except where intelligence—i.e., quickly adapting to a domain that we might not know much about, or what kind of algorithms internally will work, and where we don’t have adequate algos handed down from prior research or in instinct—is required. Which is.… what I’m saying?
I’m not sure what predictions you’re making that are different than mine, other than maybe “a research program that skips NN’s and just try to build the representations that they build up directly without looking at NNs has reasonable chances of success.” Which doesn’t seem like one you’d actually want to make.
I’m not sure what predictions you’re making that are different than mine, other than maybe “a research program that skips NN’s and just try to build the representations that they build up directly without looking at NNs has reasonable chances of success.” Which doesn’t seem like one you’d actually want to make.
I think I would, actually, want to make this prediction. The problem is that I’d want to make it primarily in the counterfactual world where the NN approach had been abandoned and/or declared off-limits, since in any world where both approaches exist, I would also expect the connectionist approach to reach dividends faster (as has occurred in e.g. our own world). This doesn’t make my position inconsistent with the notion that a GOFAI-style approach is workable; it merely requires that I think such an approach requires more mastery and is therefore slower (which, for what it’s worth, seems true almost by definition)!
I do, however, think that “building the high-level representations”, despite being slower, would not be astronomically slower than using SGD on connectionist models (which is what you seem to be gesturing at, with claims like “for a many (though not all) substantial learning tasks, it seems likely you will wait until the continents collide and the sun cools before you are able to find that algorithm”). To be fair, you did specify that you were talking about “decision-tree specific algorithms” there, which I agree are probably too crude to learn anything complex in a reasonable amount of time; but I don’t think the sentiment you express there carries over to all manner of GOFAI-style approaches (which is the strength of claim you would actually need for [what looks to me like] your overall argument to carry through).
(A decision-tree based approach would likely also take “until the continents collide and the sun cools” to build a working chess evaluation function from scratch, for example, but humans coded by hand what were, essentially, decision trees for evaluating positions, and achieved reasonable success until that approach was obsoleted by neural network-based evaluation functions. This seems like it reasonably strongly suggests that whatever the humans were doing before they started using NNs was not a completely terrible way to code high-level feature-based descriptions of chess positions, and that—with further work—those representations would have continued to be refined. But of course, that didn’t happen, because neural networks came along and replaced the old evaluation functions; hence, again, why I’d want primarily to predict GOFAI-style success in the counterfactual world where the connectionists had for some reason stopped doing that.)
Mm, I think there’s some disconnect in what we mean by an “interpretation” of a ML model. The “interpretation” of a neural network is not just some computational graph that’s behaviorally equivalent to the neural network. It’s the actual algorithm found by the SGD and implemented on the weights-and-biases of the neural network. Again, see Neel Nanda’s work here. The “interpretation” recovers the actual computations the neural network’s forward pass is doing.
You seem to say that there’s some special class of “connectionist” algorithms that are qualitatively and mechanically different from higher-level algorithms. Interpretability is more or less premised on the idea that it is not so; that artificial neurons are just the computational substrate on which the SGD is invited to write programs. And interpretability is hard because we, essentially, have to recover the high-level structure of SGD-written programs given just (the equivalent of) their machine code. Not because we’re trying to find a merely-equivalent algorithm.
I think this also addresses your concern that higher-level design is not possible to find in a timely manner. SGD manages it, so the amount of computation needed is upper-bounded by whatever goes into a given training run. And the SGD is blind, so yes, I imagine deliberative design — given theoretical understanding of the domain — would be much faster than whatever the SGD is doing. (Well, maybe not faster in real-time, given that human brains work slower than modern processors. But in a shorter number of computation-steps.)
You say it’s coherent to wish for this tech tree, and it is coherent to so wish—but do you think this is a promising research program?
Mm, I fear this argument is self-contradictory to a significant extent.
Interpretability is premised on the idea that it’s possible to reduce a “connectionist” system to a more abstract, formalized representation.
Consider the successful interpretation of a curve detector. Once we know what function a bunch of neurons implements, we can tear out these neurons, implement that function in a high-level programming language, then splice that high-level implementation into the NN in place of the initial bunch-of-neurons. If the interpretation is correct, the NN’s behavior won’t change.
Scaling this trick up, the “full” interpretation of a NN should allow us to re-implement the entire network in high-level-programming manner; I think Neel Nanda even did something similar here.
Redwood Research’s outline here agrees with this view. An “interpretation” of a NN is basically a transform of the initial weights-and-biases computational graph into a second, “simpler” higher-level computational graph.
So inasmuch as interpretability is possible, it implies the ability to transform a connectionist system into what’s basically GOFAI. And if we grant that, it’s entirely coherent to wish that we’ve followed the tech-development path where we’re directly figuring out how to build advanced AI in a high-level manner. Instead, we’re going about it in a round-about fashion: we’re generating incomprehensible black-boxes whose internals we’re then trying to translate into high-level representations.
The place where “connectionism” does outperform higher-level approaches is “blind development”. If we don’t know how the algorithm we want is supposed to work, only know what it should do, then such approaches may indeed be absolutely necessary. And in that view, it should be clear why evolution never stumbled on anything else: it has no idea what it’s doing, so of course it’d favour algorithms that work even if you have no idea what you’re doing.
(Though I don’t think brains’ uniformity is entirely downstream even of that. Computers can also be viewed as “a massive number of simple, uniform units connected to each other”, the units being transistors. Which suggests that it’s more of a requirement for general-purpose computational systems, imposed by the constraints of our reductionist universe. Not a constraint on the architecture of specifically intelligent systems.
Perhaps all good computational substrates have this architecture; but the software that’s implemented on these substrates doesn’t have to. And in the case of AI, the need for connectionism is already fulfilled by transistors, so AI should in principle be implementable via high-level programming that’s understandable by humans. There’s no absolute need for a second layer of connectionism in the form of NNs.)
I agree with your second point though, that complexity is largely the feature of problem domains, not agents navigating them. Agents’ policy functions are likely very simple, compared to agents’ world-models.
Generally speaking, that some already-learned ML algorithm can be transformed from from some form into alternate forms does not imply that it could easily—or ever at all, in the lifetime of the universe—be discovered in that second form.
For instance, it looks like any RELU-based neural network can be transformed into a decision tree, albeit potentially an extremely large one, while preserving exact functional equivalence. Nevertheless, for a many (though not all) substantial learning tasks, it seems likely you will wait until the continents collide and the sun cools before you are able to find that algorithm with decision-tree specific algorithms.
So, let’s for now assume that a completed interpretability research program allows you to transform a NN into a much more interpretable system with few removing parts, basically GOFAI—as you say. This in fact implies basically nothing about whether you could find such them without neural networks (let alone whether the idea that “This just-find-GOFAI-research-program is doomed” would be self-contradictory). You say it’s coherent to wish for this tech tree, and it is coherent to so wish—but do you think this is a promising research program?
But—isn’t intelligence basically the ability to enter a domain blind and adapt to it? For men to be on the moon, octopuses to prank aquarium staff, ravens to to solve weird-puzzles. Like I’m not sure what you’re getting at here, you seem to be saying that connectionism is inferior to other solutions except where intelligence—i.e., quickly adapting to a domain that we might not know much about, or what kind of algorithms internally will work, and where we don’t have adequate algos handed down from prior research or in instinct—is required. Which is.… what I’m saying?
I’m not sure what predictions you’re making that are different than mine, other than maybe “a research program that skips NN’s and just try to build the representations that they build up directly without looking at NNs has reasonable chances of success.” Which doesn’t seem like one you’d actually want to make.
I think I would, actually, want to make this prediction. The problem is that I’d want to make it primarily in the counterfactual world where the NN approach had been abandoned and/or declared off-limits, since in any world where both approaches exist, I would also expect the connectionist approach to reach dividends faster (as has occurred in e.g. our own world). This doesn’t make my position inconsistent with the notion that a GOFAI-style approach is workable; it merely requires that I think such an approach requires more mastery and is therefore slower (which, for what it’s worth, seems true almost by definition)!
I do, however, think that “building the high-level representations”, despite being slower, would not be astronomically slower than using SGD on connectionist models (which is what you seem to be gesturing at, with claims like “for a many (though not all) substantial learning tasks, it seems likely you will wait until the continents collide and the sun cools before you are able to find that algorithm”). To be fair, you did specify that you were talking about “decision-tree specific algorithms” there, which I agree are probably too crude to learn anything complex in a reasonable amount of time; but I don’t think the sentiment you express there carries over to all manner of GOFAI-style approaches (which is the strength of claim you would actually need for [what looks to me like] your overall argument to carry through).
(A decision-tree based approach would likely also take “until the continents collide and the sun cools” to build a working chess evaluation function from scratch, for example, but humans coded by hand what were, essentially, decision trees for evaluating positions, and achieved reasonable success until that approach was obsoleted by neural network-based evaluation functions. This seems like it reasonably strongly suggests that whatever the humans were doing before they started using NNs was not a completely terrible way to code high-level feature-based descriptions of chess positions, and that—with further work—those representations would have continued to be refined. But of course, that didn’t happen, because neural networks came along and replaced the old evaluation functions; hence, again, why I’d want primarily to predict GOFAI-style success in the counterfactual world where the connectionists had for some reason stopped doing that.)
Mm, I think there’s some disconnect in what we mean by an “interpretation” of a ML model. The “interpretation” of a neural network is not just some computational graph that’s behaviorally equivalent to the neural network. It’s the actual algorithm found by the SGD and implemented on the weights-and-biases of the neural network. Again, see Neel Nanda’s work here. The “interpretation” recovers the actual computations the neural network’s forward pass is doing.
You seem to say that there’s some special class of “connectionist” algorithms that are qualitatively and mechanically different from higher-level algorithms. Interpretability is more or less premised on the idea that it is not so; that artificial neurons are just the computational substrate on which the SGD is invited to write programs. And interpretability is hard because we, essentially, have to recover the high-level structure of SGD-written programs given just (the equivalent of) their machine code. Not because we’re trying to find a merely-equivalent algorithm.
I think this also addresses your concern that higher-level design is not possible to find in a timely manner. SGD manages it, so the amount of computation needed is upper-bounded by whatever goes into a given training run. And the SGD is blind, so yes, I imagine deliberative design — given theoretical understanding of the domain — would be much faster than whatever the SGD is doing. (Well, maybe not faster in real-time, given that human brains work slower than modern processors. But in a shorter number of computation-steps.)
Basically, yes.