Concepts and similar pages to Page 163

Page 163 Concepts and similar pages

Concepts

Similarity

Concept

Retrieval effectiveness

Measures of effectiveness

Measurement of effectiveness

Effectiveness

Relevance

E measure

Data retrieval systems

Probabilistic retrieval

Document clustering

Document representative

search with the relevant documents spaced evenly throughout that level ...a q is the query of given type;b j is the total number of documents non relevant to q in all levels preceding the final;c r is the number of relevant documents in the final level;d i is the number of non relevant documents in the final level;e s is the number of relevant documents required from the final level to satisfy the need according its type ...Now,to distribute the r relevant documents evenly among the non relevant documents,we partition the non relevant documents into r 1 subsets each containing i r 1 documents ...As a measure of effectiveness ESL is sufficient if the document collection and test queries are fixed ...where Q is the set of queries ...To extend the applicability of the measure to deal with varying test queries and document collections,we need to normalise the ESL in some way to counter the bias introduced because:1 queries are satisfied by different numbers of documents according to the type of the query and therefore can be expected to have widely differing search lengths;2 the density of relevant documents for a query in one document collection may be significantly different from the density in another ...The first item suggests that the ESL per desired relevant document is really what is wanted as an index of merit ...

160

system so that if we were to adopt [[Delta]]as a measure of effectiveness we could be throwing away vital information needed to make an extrapolation to the performance of other systems ...The Cooper model expected search length In 1968,Cooper [20]stated:The primary function of a retrieval system is conceived to be that of saving its users to as great an extent as is possible,the labour of perusing and discarding irrelevant documents,in their search for relevant ones ...a only one relevant document is wanted;b some arbitrary number n is wanted;c all relevant documents are wanted;4 a given proportion of the relevant documents is wanted,etc ...Thus,the index is a measure of performance for a query of given type ...The output of a search strategy is assumed to be a weak ordering of documents ...

In the past there has been much debate about the validity of evaluations based on relevance judgments provided by erring human beings ...Effectiveness and efficiency Much of the research and development in information retrieval is aimed at improving the effectiveness and efficiency of retrieval ...

161

Unfortunately the ranking generated by a matching function is rarely a simple ordering,but more commonly a weak ordering ...For example,consider the weak ordering in Figure 7 ...depending on how many non relevant documents precede the sixth relevant document ...4 10 ...The above procedure leads immediately to a convenient intuitive derivation of a formula for the expected search length ...

146

There has been much debate in the past as to whether precision and recall are in fact the appropriate quantities to use as measures of effectiveness ...1 the most commonly used pair;2 fairly well understood quantities ...The final question How to evaluate?has a large technical answer ...Before proceeding to the technical details relating to the measurement of effectiveness it is as well to examine more closely the concept of relevance which underlies it ...Relevance Relevance is a subjective notion ...

166

differ considerably from those which the user feels are pertinent Senko [21]...Fourthly,whereas Cooper has gone to some trouble to take account of the random element introduced by ties in the matching function,it is largely ignored in the derivation of Pnorm and Rnorm ...One further comment of interest is that Robertson 15 has shown that normalised recall has an interpretation as the area under the Recall Fallout curve used by Swets ...Finally mention should be made of two similar but simpler measures used by the SMART system ...and do not take into account the collection size N,n is here the number of relevant documents for the particular test query ...A normalised symmetric difference Let us now return to basics and consider how it is that users could simply measure retrieval effectiveness ...

178

document collections with different sets of queries then we can still use these measures to indicate which system satisfies the user more ...Significance tests Once we have our retrieval effectiveness figures we may wish to establish that the difference in effectiveness under two conditions is statistically significant ...Parametric tests are inappropriate because we do not know the form of the underlying distribution ...On the face of it non parametric tests might provide the answer ...

176

conjoint structure ...The analysis is not limited to the two factors precision and recall,it could equally well be carried out for say the pair fallout and recall ...Presentation of experimental results In my discussion of micro,macro evaluation,and expected search length,various ways of averaging the effectiveness measure of the set of queries arose in a natural way ...In this section the discussion will be restricted to single number measures such as a normalised symmetric difference,normalised recall,etc ...The measurements we have therefore are Za Q 1,Za Q 2,...

145

automatic and interactive retrieval system?Studies to gauge this are going on but results are hard to interpret ...It should be apparent now that in evaluating an information retrieval system we are mainly concerned with providing data so that users can make a decision as to 1 whether they want such a system social question and 2 whether it will be worth it ...The second question what to evaluate?boils down to what can we measure that will reflect the ability of the system to satisfy the user ...1 The coverage of the collection,that is,the extent to which the system includes relevant matter;2 the time lag,that is,the average interval between the time the search request is made and the time an answer is given;3 the form of presentation of the output;4 the effort involved on the part of the user in obtaining answers to his search requests;5 the recall of the system,that is,the proportion of relevant material actually retrieved in answer to a search request;6 the precision of the system,that is,the proportion of retrieved material that is actually relevant ...It is claimed that 1 4 are readily assessed ...

Concepts

Similar pages