Concepts and similar pages to Page 164

Page 164 Concepts and similar pages

Concepts

Similarity

Concept

Normalised recall

Data retrieval systems

Retrieval effectiveness

Cluster based retrieval

Measures of effectiveness

Measurement of effectiveness

Effectiveness

E measure

Information retrieval system

Relevance

In an analogous manner normalised precision is worked out ...The calculation of the areas is a bit more messy but simple to do see Salton [23],page 298 ...The log function appears as a result of approximating [[Sigma]]1 r by its continuous analogue [[integral]]1 r dr,which is log r constant ...The area between the worst and best case is obtained in the same way as before using the same substitution,and is:The explicit form,with appropriate normalisation,for normalised precision is therefore:Once again it varies between 0 worst and 1 best ...A few comments about these measures are now in order ...

166

differ considerably from those which the user feels are pertinent Senko [21]...Fourthly,whereas Cooper has gone to some trouble to take account of the random element introduced by ties in the matching function,it is largely ignored in the derivation of Pnorm and Rnorm ...One further comment of interest is that Robertson 15 has shown that normalised recall has an interpretation as the area under the Recall Fallout curve used by Swets ...Finally mention should be made of two similar but simpler measures used by the SMART system ...and do not take into account the collection size N,n is here the number of relevant documents for the particular test query ...A normalised symmetric difference Let us now return to basics and consider how it is that users could simply measure retrieval effectiveness ...

167

the possible ordering of this set is ignored ...Now,an intuitive way of measuring the adequacy of the retrieved set is to measure the size of the shaded area ...which is a simple composite measure ...The preceding argument in itself is not sufficient to justify the use of this particular composite measure ...

146

There has been much debate in the past as to whether precision and recall are in fact the appropriate quantities to use as measures of effectiveness ...1 the most commonly used pair;2 fairly well understood quantities ...The final question How to evaluate?has a large technical answer ...Before proceeding to the technical details relating to the measurement of effectiveness it is as well to examine more closely the concept of relevance which underlies it ...Relevance Relevance is a subjective notion ...

145

automatic and interactive retrieval system?Studies to gauge this are going on but results are hard to interpret ...It should be apparent now that in evaluating an information retrieval system we are mainly concerned with providing data so that users can make a decision as to 1 whether they want such a system social question and 2 whether it will be worth it ...The second question what to evaluate?boils down to what can we measure that will reflect the ability of the system to satisfy the user ...1 The coverage of the collection,that is,the extent to which the system includes relevant matter;2 the time lag,that is,the average interval between the time the search request is made and the time an answer is given;3 the form of presentation of the output;4 the effort involved on the part of the user in obtaining answers to his search requests;5 the recall of the system,that is,the proportion of relevant material actually retrieved in answer to a search request;6 the precision of the system,that is,the proportion of retrieved material that is actually relevant ...It is claimed that 1 4 are readily assessed ...

163

normalising the ESL by a factor proportional to the expected number of non relevant documents collected for each relevant one ...which has been called the expected search length reduction factor by Cooper ...where 1 R is the total number of documents in the collection relevant to q;2 I is the total number of documents in the collection non relevant to q;3 S is the total desired number of documents relevant to q ...The explicit form for ESL was given before ...which is known as the mean expected search length reduction factor ...Within the framework as stated at the head of this section this final measure meets the bill admirably ...For a further defence of its subjective nature see Cooper [1]...

160

system so that if we were to adopt [[Delta]]as a measure of effectiveness we could be throwing away vital information needed to make an extrapolation to the performance of other systems ...The Cooper model expected search length In 1968,Cooper [20]stated:The primary function of a retrieval system is conceived to be that of saving its users to as great an extent as is possible,the labour of perusing and discarding irrelevant documents,in their search for relevant ones ...a only one relevant document is wanted;b some arbitrary number n is wanted;c all relevant documents are wanted;4 a given proportion of the relevant documents is wanted,etc ...Thus,the index is a measure of performance for a query of given type ...The output of a search strategy is assumed to be a weak ordering of documents ...

178

document collections with different sets of queries then we can still use these measures to indicate which system satisfies the user more ...Significance tests Once we have our retrieval effectiveness figures we may wish to establish that the difference in effectiveness under two conditions is statistically significant ...Parametric tests are inappropriate because we do not know the form of the underlying distribution ...On the face of it non parametric tests might provide the answer ...

Concepts

Similar pages