Concepts and similar pages to Page 165

Page 165 Concepts and similar pages

Concepts

Similarity

Concept

The SMART measures In 1966,Rocchio gave a derivation of two overall indices of merit based on recall and precision ...The first of these indices is normalised recall ...Normalised recall Rnorm is the area between the actual case and the worst as a proportion of the area between the best and the worst ...see Salton [23],page 285 ...A convenient explicit form of normalised recall is:where N is the number of documents in the system and N n the area between the best and the worst case to see this substitute ri N i 1 in the formula for Ab Aa ...

166

differ considerably from those which the user feels are pertinent Senko [21]...Fourthly,whereas Cooper has gone to some trouble to take account of the random element introduced by ties in the matching function,it is largely ignored in the derivation of Pnorm and Rnorm ...One further comment of interest is that Robertson 15 has shown that normalised recall has an interpretation as the area under the Recall Fallout curve used by Swets ...Finally mention should be made of two similar but simpler measures used by the SMART system ...and do not take into account the collection size N,n is here the number of relevant documents for the particular test query ...A normalised symmetric difference Let us now return to basics and consider how it is that users could simply measure retrieval effectiveness ...

150

value can be calculated ...For a derivation of this relation from Bayes Theorem,the reader should consult the author s recent paper on retrieval effectiveness [10]...Averaging techniques The method of pooling or averaging of the individual P R curves seems to have depended largely on the retrieval strategy employed ...where As is the set of documents relevant to request s ...where B [[lambda]]s is the set of documents retrieved at or above the co ordination level [[lambda]]...Figure 7 ...An alternative approach to averaging is macro evaluation which can be independent of any parameter such as co ordination level ...

In the past there has been much debate about the validity of evaluations based on relevance judgments provided by erring human beings ...Effectiveness and efficiency Much of the research and development in information retrieval is aimed at improving the effectiveness and efficiency of retrieval ...

146

There has been much debate in the past as to whether precision and recall are in fact the appropriate quantities to use as measures of effectiveness ...1 the most commonly used pair;2 fairly well understood quantities ...The final question How to evaluate?has a large technical answer ...Before proceeding to the technical details relating to the measurement of effectiveness it is as well to examine more closely the concept of relevance which underlies it ...Relevance Relevance is a subjective notion ...

180

effectiveness can be calculated to infinite precision we may be insisting on a difference when in fact it only occurs in the tenth decimal place ...Finally,although I have just explained the use of the sign test in terms of single number measures,it is also used to detect a significant difference between precision recall graphs ...Bibliographic remarks Quite a number of references to the work on evaluation have already been given in the main body of the chapter ...Buried in the report by Keen Digger [32]Chapter 16 is an excellent discussion of the desirable properties of any measure of effectiveness ...A parameter which I have mentioned in passing but which deserves closer study in generality ...The trade off between precision and recall has for a long time been the subject of debate ...Guazzo [39]describe an approach to the measurement of retrieval effectiveness based on information theory ...The notion of relevance has at all times attracted much discussion ...

174

one unit of precision for an increase of one unit of recall,but will not sacrifice another unit of precision for a further unit increase in recall,i ...R 1,P 1 >R,P but R 1,P >R 2,P 1 We conclude that the interval between R 1 and R exceeds the interval between P and P 1 whereas the interval between R 1 and R 2 is smaller ...Finally,we incorporate into our measurement procedure the fact that users may attach different relative importance to precision and recall ...Definition 6 ...Can we find a function satisfying all these conditions?If so,can we also interpret it in an intuitively simple way?The answer to both these questions is yes ...The scale functions are therefore,[[Phi]]1 P [[alpha]]1 P,and [[Phi]]2 R 1 [[alpha]]1 R ...We now have the effectiveness measure ...

Concepts

Similar pages