The cosine is insensitive to the length of the document.

For the results of TREC 3 the relative frequencies to be judged relevant and the relative frequency to get a high ranking by the cosine similarity measure were compared. It turned out that short documents are ranked too high compared to the relevance judgement while long documents are ranked to low compared to the relevant judgement. To compensate this effect the similarity measure was changed in such a way that these differences vanished (for details see: Singhal, Buckley & Mitra 1996 [->])


