|Projects & Areas of Interest|
Information Retrieval (IR) is the process of searching for an "electronic document" with a specific content. Originally the paradigm was restricted to the search in homogeneous collections of digital text documents like bibliographic databases. With the development of information technology and networking the notion of a document has been widened to "multimedia documents" (like images, sound, video) and heterogeneous "collections" like the documents of organizations ("Data Warehouse" concept) or the World Wide Web (web search). Sometimes the search for text documents is now called Text Retrieval as opposed to Multimedia Retrieval.
One of the primary challenges of IR research is to communicate the content of a document or an "information need" between humans and machines. This requires an representation of the information (or knowledge) that is simple enough to be "understood" by man and machine and complex enough to distinguish between documents with diverse content and to compare that content in a way that is appropriate for human users.
With regard to such representations text documents have some advantages compared to other media. The first one is that language and writing are knowledge representations with a long tradition that many people are acquainted to. The second is that text can be decomposed into words that form something like atomic units of meaning. This does by far not mean that the content of a text is contained in the collection of its words, but this collection preserves much more of the content of a text than - for example - the color histogram of a picture. These collections of words can be used as a coarse representation of the content of a text. For the use in IR such a coarse representation is not only a problem (because of lacking precision) but as well a chance as a kind of "generalized" representation that allows to recall documents vaguely related to an information need. It is part of the IR-challenge to avoid the problems caused by such representations and enhance the benefits they offer.
Lectures and Reading on IR
Between 1995 and 2000 I have given classes/lectures (Vorlesungen) on Information Retrieval and the application of Data Mining methods for IR at the Darmstadt University of Technology: "Data Mining und Information Retrieval" and "Informationssysteme" (both in German). Further I have authored the chapter on IR ("Dokumentsuche und Dokumenterschließung") in the first German computer science handbook (Hanser Informatik-Handbuch [->]). In March 2003 I have published the textbook "Information Retrieval - Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web" that gives a rather detailed introduction to IR methods, the application of Data Mining methods for IR and the use of IR methods in web search. A Web-Version of the books complete content is available (in German).