3.7.2: Co-occurrence Based Methods

The weighting strategies for terms in a document vector are applied to single words. Dependencies between words were not taken into account for the selection of index or query terms. But the occurrences of a term in a document is not independent of the occurrence of other terms in the same document. These dependencies can be used to select index and query terms.

The associative model assumes that the meanings of terms occurring frequently together in documents are related. This principle is a rather old one:

"objects once experienced together tend to become associated in the imagination, so that when any one of them is thought of, the others are likely to be thought of also"

(James, 1890, Vol 1, page 561) [->]

ZUGANGAbb. 26: Associations Automatically Generated from the Lob - and the Brown Corpus Using Co-occurrences of Terms

Several approaches have used similarities between terms based on their co-occurrence in documents to

Similarity of terms can be gained from the document vectors of a collection: they form a so called Term-Document-Matrix

W={wi,j}i=1,...,m; j=1,...,n

This matrix can be multiplied by its transposed leading to a n×n -matrix Wt·W called the Term-Term-Matrix. The entries in this matrix are the inner products of vectors that are composed of the weights that the respective term is given in the documents of the collection; i. e. "term vectors" that describe the meaning of a term by the documents it occurs in. The inner product of the term vectors of two terms cn be used as a similarity value for the two terms.

ZUGANG3.7.2.1: Associative Indexing and Query Expansion

ZUGANG3.7.2.2: Cross Language Retrieval


© 1998 / HTML-Version 17. 11. 1998: R. Ferber