Search Engines and Oracles

Some time ago, I was following a conversation about Wolfram Alpha (http://​​www.wolframalpha.com/​​), an attempt to implement a sort of general purpose question answerer, something people have dreamed about computers doing for decades. Despite the theoretical availability to find out virtually anything from the Internet, we seem pretty far from any plausible approximation of this dream (at least for general consumption). My first attempt was:

Q: “who was the first ruler of russia?”

A: Vladimir Putin

It’s a problematic question that depends on questions like “When did Russia become Russia”, or “What do we count, historically as Russia”, or even what one means by “Ruler”, and a reasonably satisfactory answer would have had to be fairly complicated—either that, or the question would have to be reworded to be so precise that one name could serve as the answer.

On another problematic question I thought it did rather well:

Q: what is the airspeed velocity of an unladen african swallow?

What occurred to me though, is that computer science could do something quite useful intermediate between “general purpose question answerer” and the old database paradigm of terms ANDed or ORed together. (Note that what Google does is neither of these, nor should it be placed on a straight line between the two—but discussion of Google would take me far off topic).

A simple example of what I’d really like is a search engine that matches *concepts*. Does anyone know of such a thing? If it exists, I should possibly read about it and shut up, but let me at least try to be sure I’m making the idea clear:

E.g., I’d like to enter <<rulers of russia>>, and get a list of highly relevant articles.

Or, I’d like to enter <<repair of transmission of “1957 Ford Fairlane”>> and get few if any useless advertisements, and something much better than all articles containing the words “repair” “transmission” and “1957 Ford Fairlane”—e.g., *not* an article on roof repair that happened to mention that “My manual transmission Toyota truck rear-ended a 1957 Ford Fairlane”.

It seems to me mere implementation of a few useful connectives like “of”, and maybe the recognition of an adjective-noun phrase, and some heuristics like expanding words to *OR*ed lists of synonyms (ruler ==> (president OR king OR dictator …)) would yield quite an improvement over the search engines I’m familiar with.

This level of simple grammatical understanding is orders of magnitude simpler than the global analysis and knowledge of unlimited sets of information sources, such as a general purpose question answerer would require.

I’d like to know if anyone else finds this interesting, or knows of any leads for exploring anything related to these possibilities.

By the way, when I entered “rulers of russia” into Wolfram-Alpha, the answer was still Putin, with brief mention of others going back to 1993, so “Russia” seems to be implicitly defined as the entity that has existed since 1993, and there is an attempt at making it an *answer to the (assumed) question* rather than a good list of articles that could shed light on various reasonable interpretations of the phrase.