Interaction Merger for Associations Gained by Inspection of Numerous Exemplars

IMAGINE is a tool that extracts associations between objects based on the co-occurrence of these objects in large corpora. In this implementation the object may be words, names, or descriptors from a thesaurus in large text corpora.

IMAGINE is based on fixed vocabularies of objects: one for the input and one for the output. The input and output vocabularies may be the same. As a response to a query consisting of objects from the input vocabulary IMAGINE determines related objects form the output vocabulary and offers them to the user. Only objects of these two vocabularies can be processed.

Description of the Program

Input Area

The input area consists of several input means. First there is a field for direct keyboard input (With some systems it is possible to copy any highlighted text into this area, by pressing the middle mouse button).


Below this input field is the history field. It contains the queries already made during the session. These queries can be added to the present query by selection with the mouse. However these queries can not be changed.

Finally, in the not framed version, terms from the previous results can be selected using the checkboxes close to the term.

Input Processing

IMAGINE splits the input string into blocks consisting only of characters a-z, A-Z and äüöÄÖÜ&|-_' . In what follows these blocks are called (input) words.

The input words can be weighted with real numbers. To this end, the number is placed in front of the word, separated by a blank. If no number is given, the default value 1.0 is assumed. Such a number is used as weight for all following words, until a new weight is given. Hence, to give a weight x to only one word in a query and leave the other words weighted with 1.0 has to be placed in front of the word, and the weight 1.0 has to be inserted after it.

The input words are compared with the predefined input vocabulary. Words that are part of this vocabulary are displayed together with possible weights under "words that IMAGINE recognizes". Words that are not in the input vocabulary are displayed under "words IMAGINE does not recognize". The latter words are not used.

Determination of Related Terms

IMAGINE computes for each input word that is part of the input vocabulary associations to all terms of the output vocabulary. Each of the output terms gets an activation equal to the product of the association and the weight of the input term. These activations are added up in the output terms for all input terms. The sum of the received activities of an output term is interpreted as a measure of its relevance for the query.

Finally the terms of the output vocabulary are ordered according to their activation. The top terms of the ordering are presented to the user.

Influence of Rare Terms

The overall frequency of a term or a word in the corpus used for co-occurrence analysis can be used within the calculation of the associations between input words and output terms. Within this calculation it is possible to scale the influence of rare words and terms. The parameters Impact of rare terms are used for this purpose. They should have a value between 0.0 and 1.5. There are two seperate such parameters; one for the input words and one for the output terms. If the parameter for the input words is small then frequent input words have a stronger influence on the selection of output terms. If it is big then rare terms have more impact on the answer.

If the parameter for the output terms is small then frequent output terms are more likely to appear on top of the output list of the system. If it is big then rare terms more likely to be selected.

Experimental studies have shown, that good values are 1.03 for the impact of input words and 0.45 for the impact of output terms

To set a value, you can select one from the menu or you can select -> and enter the value in the input field to the right of the menu.


Relevance Values

IMAGINE calculates a relevance value for each term of the output vocabulary. To interpret these values, IMAGINE gives the percentage that a term covers of the sum of all relevance values. If a response to a query consists of a few terms only that cover large parts of the sum of relevance values, it is likely that the query is rather specific and that these terms are very relevant for it. If there are many terms that cover only small parts of the sum of the relevance values, the query is more general.

Output Threshold

The parameter Output threshold restricts the output to terms that cover a larger part of the sum of relevance values than the threshold indicated.

Maximum Number of Displayed Terms

However the output is also restricted by the maximum number of displayed terms. If this parameter is big, processing time may be increased.

Number of Collums

In the nonframed version the parameter Number of columns is used to determine how many columns are used to display the results.

Language Selection

IMAGINE allows the selection of several languages. This affects mainly the interface. Which languages can be used for input and output depends on the corpora and vocabularies used. If for example the output vocabulary is part of a multilingual thesaurus, IMAGINE can use the respective language version of this thesaurus.

However in this case the various languages of output are only translations based on the thesaurus. In principle the vocabularies are not restricted to contain only terms from one language. But the use of several languages in one vocabulary will cause many problems, for example a string of characters may be part of more than one language and may have several meanings in different languages.

HTML file generated by R. Ferber: 15. 12. 1997