Concepts and similar pages to Page 167

Page 167 Concepts and similar pages

Concepts

Similarity

Concept

Normalised symmetric difference

E measure

Relevance

Probabilistic retrieval

Document clustering

Document representative

Retrieval effectiveness

Typical document

Maximally linked document

Recall

differ considerably from those which the user feels are pertinent Senko [21]...Fourthly,whereas Cooper has gone to some trouble to take account of the random element introduced by ties in the matching function,it is largely ignored in the derivation of Pnorm and Rnorm ...One further comment of interest is that Robertson 15 has shown that normalised recall has an interpretation as the area under the Recall Fallout curve used by Swets ...Finally mention should be made of two similar but simpler measures used by the SMART system ...and do not take into account the collection size N,n is here the number of relevant documents for the particular test query ...A normalised symmetric difference Let us now return to basics and consider how it is that users could simply measure retrieval effectiveness ...

164

The SMART measures In 1966,Rocchio gave a derivation of two overall indices of merit based on recall and precision ...The first of these indices is normalised recall ...Normalised recall Rnorm is the area between the actual case and the worst as a proportion of the area between the best and the worst ...see Salton [23],page 285 ...A convenient explicit form of normalised recall is:where N is the number of documents in the system and N n the area between the best and the worst case to see this substitute ri N i 1 in the formula for Ab Aa ...

There are five commonly used measures of association in information retrieval ...The simplest of all association measures is X [[intersection]]Y Simple matching coefficient which is the number of shared index terms ...These may all be considered to be normalised versions of the simple matching coefficient ...then X 1 1 Y 1 1 X 1 [[intersection]]Y 2 1 >S 1 1 S 2 1 X 2 10 Y 2 10 X 2 [[intersection]]Y 2 1 >S 1 1 S 2 1 10 S 1 X 1,Y 1 S 1 X 2,Y 2 which is clearly absurd since X 1 and Y 1 are identical representatives whereas X 2 and Y 2 are radically different ...Doyle [17]hinted at the importance of normalisation in an amusing way:One would regard the postulate All documents are created equal as being a reasonable foundation for a library description ...

162

search with the relevant documents spaced evenly throughout that level ...a q is the query of given type;b j is the total number of documents non relevant to q in all levels preceding the final;c r is the number of relevant documents in the final level;d i is the number of non relevant documents in the final level;e s is the number of relevant documents required from the final level to satisfy the need according its type ...Now,to distribute the r relevant documents evenly among the non relevant documents,we partition the non relevant documents into r 1 subsets each containing i r 1 documents ...As a measure of effectiveness ESL is sufficient if the document collection and test queries are fixed ...where Q is the set of queries ...To extend the applicability of the measure to deal with varying test queries and document collections,we need to normalise the ESL in some way to counter the bias introduced because:1 queries are satisfied by different numbers of documents according to the type of the query and therefore can be expected to have widely differing search lengths;2 the density of relevant documents for a query in one document collection may be significantly different from the density in another ...The first item suggests that the ESL per desired relevant document is really what is wanted as an index of merit ...

Concepts

Similar pages