This seems to happen a lot in computer vision—particularly object recognition—where the prevailing paradigm involves training the system on images taken from numerous perspectives which then become exemplars for subsequent matching. The problem with this is that it just doesn’t scale. Having a large number of 2D training examples, then trying to match also in 2D, works for objects which are inherently flat—like the cover of a book—but not for objects with significant 3D structure, such as a chair. For more effective recognition the system needs to capture the 3D essence of the object, not it’s innumerable 2D shadows.
This seems to happen a lot in computer vision—particularly object recognition—where the prevailing paradigm involves training the system on images taken from numerous perspectives which then become exemplars for subsequent matching. The problem with this is that it just doesn’t scale. Having a large number of 2D training examples, then trying to match also in 2D, works for objects which are inherently flat—like the cover of a book—but not for objects with significant 3D structure, such as a chair. For more effective recognition the system needs to capture the 3D essence of the object, not it’s innumerable 2D shadows.