### 3.6.4.2: Robertsons-Spark Jones Formula

This formula uses relevance feedback data in a different way to
determine a query vector:

with *N* being the number of documents in the collection,
*R(q)* the number of documents judged relevant for
*q* *d(k)* the number of documents that contain term
*t*_{k}
and *R(q,k)* the number of relevant documents that contain term
*t*_{k}
Details can be found in Robertson, Walker,
Hancock-Beaulieu and Gatford TREC
3 [->];
Robertson, Walker, Beaulieu, Gatford and Payne TREC 4 [->].

To compute the weight of term *t*_{k} this formula distinguishes the
set of documents according to two
criteria: If a document contains term *t*_{k} , and
if it is judged relevant for the query.

The
nominator deals with
the relevant documents comparing those containing the term
with those that do not. The
denominator does the similar comparison
for the documents that are not relevant.

- a term that occurs in most relevant documents and does occur in
few non relevant documents gets a high value for the nominator and a low
value for the denominator, i. e. altogether a
high weight
- If the term also occurs in many non relevant documents the
denominator will increase leading to
a smaller weight.
- If the term is contained in only a few relevant documents but in
many non-relevant documents the nominator will be small and the
denominator will be big leading to a
small weight.

© 1998 / HTML-Version 17. 11. 1998: R. Ferber