3.6.1: Pseudo Relevance Feedback

Idea: use documents found in a first run to enhance the query for a second run. This idea is based on the aassumption that the top ranked documents are likely to be relevant i. e. that the system already performs quite well. In contrast to the Rocchio relevance feedback most systems did not use the complete document vectors to modify the query, but they selected a limited number of terms occurring most frequently in the top ranked documents to enhance the query.

In TREC 3 SMART for example used the following procedure: After a first query operation the top 30 documents of the ranked list were used for feedback. All terms from these documents were ordered according to their frequency in the top 30 documents. The 500 most frequent terms were selected, the 30 document vectors were restricted to these 500 terms and added to the query vector according to the Rocchio formula with parameters (8, 8, 0).

Pseudo Relevance Feedback was quite successful


