Search this site:

On natural language processing

Wouter was wondering about natural language processing. I have got quite interested in that field, although I also lack any real knowledge on that except for a couple of quite simple articles I have read and talks I have attended. A great resource on this is Alexander Gelbukh - I saw him at his talk in the 30th CLEI conference in Arequipa, Perú. He has some quite interesting articles about NLP on his web site, although they are in Spanish (anyway, for anybody interested: Avances en análisis automático de textos and Tendencias recientes en el procesamiento de lenguaje natural) but browse around, there are still many good links. The basic idea from the two Spanish articles is that NLP goes through the same basic steps that a formal language compiler goes (i.e., lexing, parsing, semantic analysis) - The main difference is that any sentence in a natural language has many implicit relations with an universe of knowledge around it, so you cannot just build a parse tree for each of the sentences - You must have a universe of concepts and fit each of the sentence parse trees of the text you analyzed in it. Of course, in order to do so, you must also solve the ambiguities that are so common in spoken language, but that’s another whole topic. Gelbukh’s works are, AFAICT, driven towards data mining - performing automatical analysis of many texts and coming up with conclusions that are not explicitly stated in any of them, probably with mechanisms to trace back to which pieces of information led the system to each of them. As I told you, I really liked this topic, and I intend on diving deeper into it as soon as I get out of some obligations… But I’m sure Gelbukh’s page will be a interesting reading. Another project I really enjoyed (and completely unrelated to what I wrote here, its realm lies much further to the bottom, near the lexical/grammatical analysis phases) is Snowball, a free language for stemming algorithms, which has implemented stemmers for many European languages. The Snowball site has also a very nice article regarding what is stemming, how it works, and how it has grown over the time.


EdCrypt 2005-01-25 14:45:56

RE: On natural language processing

Did you take a look at GNOME Storage? NLP is an important piece of it.