Robertsons-Spark Jones Formula

This formula uses relevance feedback data in a different way to determine a query vector:

with N being the number of documents in the collection, R(q) the number of documents judged relevant for q d(k) the number of documents that contain term tk and R(q,k) the number of relevant documents that contain term tk

Details can be found in Robertson, Walker, Hancock-Beaulieu and Gatford TREC 3 [->]; Robertson, Walker, Beaulieu, Gatford and Payne TREC 4 [->].

To compute the weight of term tk this formula distinguishes the set of documents according to two criteria: If a document contains term tk , and if it is judged relevant for the query.

The nominator deals with the relevant documents comparing those containing the term with those that do not. The denominator does the similar comparison for the documents that are not relevant.

Abb. 25: Number of Documents in a Collection when Classified According to the two Criteria: "Contains a term tk " and "Is Relevant to Query"


