Hm. I wonder what an “alternative” to neural nets and gradient descent would look like. Neural nets are really just there as a highly expressive model class that gradient descent works on.
One big difficulty is that if your model is going to classify pictures of cats (or go boards, etc.), it’s going to be pretty darn complicated, and I’m sceptical that any choice of model class is going to prevent that. But maybe one could try to “hide” this complexity in a recursive structure. Neural nets already do this, but convnets especially mix up spatial hierarchy with logical hierarchy, and nns in general aren’t as nicely packaged into human-thought-sized pieces as maybe they could be—consider resnets, which work well precisely because they abandon the pretense of each neuron being some specific human-scale logical unit.
So maybe you could go the opposite direction and make that pretense a reality with some kind of model class that tries to enforce “human-thought-sized” reused units with relatively sparse inter-unit connections? Could still train with SGD, or treat hypotheses as decision trees and take advantage of that literature.
But suppose we got such a model class working, and trained it to recognize cats. Would it actually be human-comprehensible? Probably not! I guess I’m just not really clear on what “designed for transparency and alignability” is supposed to cash out to at this stage of the game.
Hm. I wonder what an “alternative” to neural nets and gradient descent would look like. Neural nets are really just there as a highly expressive model class that gradient descent works on.
One big difficulty is that if your model is going to classify pictures of cats (or go boards, etc.), it’s going to be pretty darn complicated, and I’m sceptical that any choice of model class is going to prevent that. But maybe one could try to “hide” this complexity in a recursive structure. Neural nets already do this, but convnets especially mix up spatial hierarchy with logical hierarchy, and nns in general aren’t as nicely packaged into human-thought-sized pieces as maybe they could be—consider resnets, which work well precisely because they abandon the pretense of each neuron being some specific human-scale logical unit.
So maybe you could go the opposite direction and make that pretense a reality with some kind of model class that tries to enforce “human-thought-sized” reused units with relatively sparse inter-unit connections? Could still train with SGD, or treat hypotheses as decision trees and take advantage of that literature.
But suppose we got such a model class working, and trained it to recognize cats. Would it actually be human-comprehensible? Probably not! I guess I’m just not really clear on what “designed for transparency and alignability” is supposed to cash out to at this stage of the game.