3.6.3: Passage Retrieval

Several systems tried methods that divide the documents in blocks or overlapping windows of fixed or limited size to calculate similarities and to do pseudo relevance feedback. The idea behind this approach is that within longer documents specific topics are dealt with in subsections. Similarity measures that are based on subsections of documents should easier find these topics.

For example in SMART the documents were divided into overlapping blocks of 200 terms. The similarity measure was defined as follows:

with Bd denoting the vectors of blocks of the document dD . BD denotes the vectors generated from all Blocks/ from documents of D

The use of passage retrieval did in general not yield the expected success.


