### 3.7.2: Co-occurrence Based Methods

The weighting strategies for terms in a document vector are applied
to single words. Dependencies between words were not taken into account
for the selection of index or query terms.
But the occurrences of a term in a document is not
independent of the occurrence of other terms in the same document.
These dependencies can be used to select index and query
terms.

The associative model assumes that the meanings of terms
occurring frequently together in documents
are related. This principle is a rather
old one:

"objects once experienced together tend to become associated
in the imagination, so that when any one of them is thought of, the
others are likely to be thought of also"

(James, 1890, Vol 1, page 561) [->]

Several approaches have used
similarities between terms based on their co-occurrence
in documents to

- model human associations and memory
- construct associative thesauri based on the similarity of
terms
- use these thesauri for automated indexing
- use these thesauri for query expansion

Similarity of
terms can be gained from the document vectors
of a collection: they form a so called
*Term-Document-Matrix*

*W={w*_{i,j}}_{i=1,...,m; j=1,...,n}

This matrix can be multiplied by its transposed leading to
a *n×n* -matrix *W*^{t}·W called the
*
Term-Term-Matrix*.
The entries in this matrix are the inner
products of vectors that are composed of the
weights that the respective term is given in the documents of the
collection; i. e. "term vectors" that describe
the meaning of a term by the documents it occurs in. The inner product
of the term vectors of two terms cn be used as a
similarity value for the two terms.

© 1998 / HTML-Version 17. 11. 1998: R. Ferber