Cross Language Retrieval

A "parallel" corpus of documents in two different languages can be used to search for documents in one language using a query formulated in the other language: To this end documents in the two languages with the same content are used to compute similarity values between the terms of the two languages. These similarities are used to expand the query with terms of the other language. These new terms can be used to search in the collection.

Sheridan and Ballerini (1996) [->] used 93 229 Italian news stories of the Swiss News Agency indexed by time, location, and a content category (out of 50 possible classes). From the German service of the same agency 10 293 articles could be identified with the same indexing. Based on these pairs of identically indexed articles the similarity values for the occurring terms were computed and used to expand German queries. The results were compared with results obtained with intelectually generated translations of the queries.

Abb. 27: Results of the Study on Cross Language Retrieval Using Similarity Measures Generated from Parallel Corpora. (from Sheridan and Ballerini, 1996)


