Most global strategies use term statistics to determine the usefulness of terms for retrieval:

Abb. 20: Discrimination Power vs. Frequency (from Salton & McGill 1983) Inverted document frequency

To handle the frequent terms: give terms that occur in many documents low weights. The document frequency d(j) of term tj is defined as number of documents a term occurs in. Method: Inverted document frequency (IDF);

where m denotes again the total number of documents in the collection.


