ZURÜCK Global Weighting Strategies

Most global strategies use term statistics to determine the usefulness of terms for retrieval:

ZUGANGAbb. 20: Discrimination Power vs. Frequency (from Salton & McGill 1983) Inverted document frequency

To handle the frequent terms: give terms that occur in many documents low weights. The document frequency d(j) of term tj is defined as number of documents a term occurs in. Method: Inverted document frequency (IDF);

where m denotes again the total number of documents in the collection.


© 1998 / HTML-Version 17. 11. 1998: R. Ferber