In practice, most neural or Bayesian program synthesis applications I’m aware of exhibit something like structural interpretability, because the hypothesis space they live in is designed by modelers to have human-interpretable semantic structure. Two good examples of this…
When I squint out towards the horizon, I see future researchers trying to do a Bayesian program synthesis thing that builds a generative model of the whole world—everything from “tires are usually black”, to “it’s gauche to wear white after labor day”, to “in this type of math problem, maybe try applying the Cauchy–Schwarz inequality”, etc. etc. etc.
I’m perfectly happy to believe that Lake et al. can program-synthesis a little toy generative model of handwritten characters such that it has structural interpretability. But I’m concerned that we’ll work our way up to the thing in the previous paragraph, which might be a billion times more complicated, and it will no longer have structural interpretability.
(And likewise I’m concerned that solutions to “uninformative variable names” won’t scale—e.g., how are we going to automatically put English-language labels on the various intuitive models / heuristics that are involved when Ed Witten is thinking about math, or when MLK Jr is writing a speech?)
I’m more optimistic about the interpretability prospects of learning approaches that generate models or code that look like traditional symbolic programs, relative to end-to-end deep learning approaches [emphasis added]
Nominally, I agree with this. But “relative to” is key here.
Your takeaway seems to be “OK, great, let’s do probabilistic generative models, they’re better!”.
By contrast, my perspective is: “If we take the probabilistic generative model approach, we’re in huge trouble with respect to interpretability, oh man this is really really bad, we gotta work on this ASAP!!! (Oh and by the way if we take the deep net approach then it’s even worse.)”.
On the contrary, I think there exist large, complex, symbolic models of the world that are far more interpretable and useful than learned neural models, even if too complex for any single individual to understand, e.g.:
- The Unity game engine (a configurable model of the physical world) - Pixar’s RenderMan renderer (a model of optics and image formation) - The GLEAMviz epidemic simulator (a model of socio-biological disease spread at the civilizational scale)
Humans are capable of designing and building these models, and learning how to build/write them as they improve their understanding of the world. The difficult part is how we can recapitulate that ability—program synthesis is only in its infancy in it’s ability to do so, but IMO contemporary end-to-end deep learning methods seem unlikely to deliver here if want both interpretability and usefulness.
I agree that gwern’s proposal “Any model simple enough to be interpretable is too simple to be useful” is an exaggeration. Even the Lake et al. handwritten-character-recognizer is useful.
I would have instead said “Any model simple enough to be interpretable is too simple to be sufficient for AGI”.
I notice that you are again bringing the discussion back to a comparison between program synthesis world-models versus deep learning world-models, whereas I want to talk about the possibility that neither would be human-interpretable by the time we reach AGI level.
Thanks for your reply!
When I squint out towards the horizon, I see future researchers trying to do a Bayesian program synthesis thing that builds a generative model of the whole world—everything from “tires are usually black”, to “it’s gauche to wear white after labor day”, to “in this type of math problem, maybe try applying the Cauchy–Schwarz inequality”, etc. etc. etc.
I’m perfectly happy to believe that Lake et al. can program-synthesis a little toy generative model of handwritten characters such that it has structural interpretability. But I’m concerned that we’ll work our way up to the thing in the previous paragraph, which might be a billion times more complicated, and it will no longer have structural interpretability.
(And likewise I’m concerned that solutions to “uninformative variable names” won’t scale—e.g., how are we going to automatically put English-language labels on the various intuitive models / heuristics that are involved when Ed Witten is thinking about math, or when MLK Jr is writing a speech?)
Nominally, I agree with this. But “relative to” is key here.
Your takeaway seems to be “OK, great, let’s do probabilistic generative models, they’re better!”.
By contrast, my perspective is: “If we take the probabilistic generative model approach, we’re in huge trouble with respect to interpretability, oh man this is really really bad, we gotta work on this ASAP!!! (Oh and by the way if we take the deep net approach then it’s even worse.)”.
We could probably use a term or a phrase for this concept since it keeps coming up and is a fundamental problem. How about:
Corollary:
On the contrary, I think there exist large, complex, symbolic models of the world that are far more interpretable and useful than learned neural models, even if too complex for any single individual to understand, e.g.:
- The Unity game engine (a configurable model of the physical world)
- Pixar’s RenderMan renderer (a model of optics and image formation)
- The GLEAMviz epidemic simulator (a model of socio-biological disease spread at the civilizational scale)
Humans are capable of designing and building these models, and learning how to build/write them as they improve their understanding of the world. The difficult part is how we can recapitulate that ability—program synthesis is only in its infancy in it’s ability to do so, but IMO contemporary end-to-end deep learning methods seem unlikely to deliver here if want both interpretability and usefulness.
I agree that gwern’s proposal “Any model simple enough to be interpretable is too simple to be useful” is an exaggeration. Even the Lake et al. handwritten-character-recognizer is useful.
I would have instead said “Any model simple enough to be interpretable is too simple to be sufficient for AGI”.
I notice that you are again bringing the discussion back to a comparison between program synthesis world-models versus deep learning world-models, whereas I want to talk about the possibility that neither would be human-interpretable by the time we reach AGI level.