I wrote a book about a new philosophy of empirical science based on large scale lossless data compression. I use the word “comperical” to express the idea of using the compression principle to guide an empirical inquiry. Though I developed the philosophy while thinking about computer vision (in particular the chronic, disastrous problems of evaluation in that field), I realized that it could also be applied to text. The resulting research program, which I call comperical linguistics, is something of a hybrid of linguistics and natural language processing, but (I believe) on much firmer methodological ground than either. I am now carrying out research in this area, AMA.
First, I want people in computer vision and NLP to actually look at the data sets their algorithms apply to. Ask a physicist to tell you some facts about physical reality, and they will rattle off a lengthy list of concepts, like conservation of energy, isotropy of spacetime, Ohm’s law, etc etc. As a vision scientist to tell you some things about visual reality, and my guess is they won’t have much to say. Sure, a vision scientist can talk a lot about algorithms, machine learning techniques, feature sets, and other computational tools, but they can’t tell you much about what’s actually in the images. The same problem is true with NLP people to a lesser degree; they can talk about parsing algorithms and optimization procedures for finding MaxEnt parameters, but they can’t tell you much about the actual structure of text.
So, yes, I expect the approach to produce new techniques, but not because it supplies some kind of new mathematical framework. It suggests a new set of questions.
I wrote a book about a new philosophy of empirical science based on large scale lossless data compression. I use the word “comperical” to express the idea of using the compression principle to guide an empirical inquiry. Though I developed the philosophy while thinking about computer vision (in particular the chronic, disastrous problems of evaluation in that field), I realized that it could also be applied to text. The resulting research program, which I call comperical linguistics, is something of a hybrid of linguistics and natural language processing, but (I believe) on much firmer methodological ground than either. I am now carrying out research in this area, AMA.
How do you expect this work to influence the fields of computer vision, NLP, etc. -- would it inspire new techniques?
First, I want people in computer vision and NLP to actually look at the data sets their algorithms apply to. Ask a physicist to tell you some facts about physical reality, and they will rattle off a lengthy list of concepts, like conservation of energy, isotropy of spacetime, Ohm’s law, etc etc. As a vision scientist to tell you some things about visual reality, and my guess is they won’t have much to say. Sure, a vision scientist can talk a lot about algorithms, machine learning techniques, feature sets, and other computational tools, but they can’t tell you much about what’s actually in the images. The same problem is true with NLP people to a lesser degree; they can talk about parsing algorithms and optimization procedures for finding MaxEnt parameters, but they can’t tell you much about the actual structure of text.
So, yes, I expect the approach to produce new techniques, but not because it supplies some kind of new mathematical framework. It suggests a new set of questions.