3.3.3: Thesauri

A thesaurus describes words or terms of a specific domain / vocabulary and the relations between these words or terms.

Relations are not restricted to hierarchical relations. Examples are:

Further a thesaurus may define one or several meanings of a word.

In an IR system a thesaurus has an additional role: It defines a controlled vocabulary as subset of all words in the thesaurus that is used to index documents. Formal definition of indexing with a controlled vocabulary T : An attribute


is defined, that has the (set of subsets of the) controlled vocabulary as its range (set of values). The controlled vocabulary contains exactly one descriptor for each set of mutually synonymous words. All other words of such a "synonym set" have a "USE" relation pointing to this descriptor. Thus for indexing and retrieval an unique term is used for each subset of synonyms.The definition of the "synonym" relation controls how many details can be represented by means of a thesaurus.

Further the controlled vocabulary is structured hierarchically by the "more general" and "more specific" relations. These relations can be used to make queries more general and more specific.

