Concepts and similar pages to Page 166

Page 166 Concepts and similar pages

Concepts

Similarity

Concept

Rank recall

Log precision

Retrieval effectiveness

Effectiveness

E measure

Relevance

Data retrieval systems

Measures of effectiveness

Measurement of effectiveness

Probabilistic retrieval

The SMART measures In 1966,Rocchio gave a derivation of two overall indices of merit based on recall and precision ...The first of these indices is normalised recall ...Normalised recall Rnorm is the area between the actual case and the worst as a proportion of the area between the best and the worst ...see Salton [23],page 285 ...A convenient explicit form of normalised recall is:where N is the number of documents in the system and N n the area between the best and the worst case to see this substitute ri N i 1 in the formula for Ab Aa ...

146

There has been much debate in the past as to whether precision and recall are in fact the appropriate quantities to use as measures of effectiveness ...1 the most commonly used pair;2 fairly well understood quantities ...The final question How to evaluate?has a large technical answer ...Before proceeding to the technical details relating to the measurement of effectiveness it is as well to examine more closely the concept of relevance which underlies it ...Relevance Relevance is a subjective notion ...

In the past there has been much debate about the validity of evaluations based on relevance judgments provided by erring human beings ...Effectiveness and efficiency Much of the research and development in information retrieval is aimed at improving the effectiveness and efficiency of retrieval ...

167

the possible ordering of this set is ignored ...Now,an intuitive way of measuring the adequacy of the retrieved set is to measure the size of the shaded area ...which is a simple composite measure ...The preceding argument in itself is not sufficient to justify the use of this particular composite measure ...

162

search with the relevant documents spaced evenly throughout that level ...a q is the query of given type;b j is the total number of documents non relevant to q in all levels preceding the final;c r is the number of relevant documents in the final level;d i is the number of non relevant documents in the final level;e s is the number of relevant documents required from the final level to satisfy the need according its type ...Now,to distribute the r relevant documents evenly among the non relevant documents,we partition the non relevant documents into r 1 subsets each containing i r 1 documents ...As a measure of effectiveness ESL is sufficient if the document collection and test queries are fixed ...where Q is the set of queries ...To extend the applicability of the measure to deal with varying test queries and document collections,we need to normalise the ESL in some way to counter the bias introduced because:1 queries are satisfied by different numbers of documents according to the type of the query and therefore can be expected to have widely differing search lengths;2 the density of relevant documents for a query in one document collection may be significantly different from the density in another ...The first item suggests that the ESL per desired relevant document is really what is wanted as an index of merit ...

163

normalising the ESL by a factor proportional to the expected number of non relevant documents collected for each relevant one ...which has been called the expected search length reduction factor by Cooper ...where 1 R is the total number of documents in the collection relevant to q;2 I is the total number of documents in the collection non relevant to q;3 S is the total desired number of documents relevant to q ...The explicit form for ESL was given before ...which is known as the mean expected search length reduction factor ...Within the framework as stated at the head of this section this final measure meets the bill admirably ...For a further defence of its subjective nature see Cooper [1]...

is another example of a matching function ...A popular one used by the SMART project,which they call cosine correlation,assumes that the document and query are represented as numerical vectors in t space,that is Q q 1,q 2,...or,in the notation for a vector space with a Euclidean norm,where [[theta]]is the angle between vectors Q and D ...Serial search Although serial searches are acknowledge to be slow,they are frequently still used as parts of larger systems ...Suppose there are N documents Di in the system,then the serial search proceeds by calculating N values M Q,Di the set of documents to be retrieved is determined ...1 the matching function is given a suitable threshold,retrieving the documents above the threshold and discarding the ones below ...2 the documents are ranked in increasing order of matching function value ...

178

document collections with different sets of queries then we can still use these measures to indicate which system satisfies the user more ...Significance tests Once we have our retrieval effectiveness figures we may wish to establish that the difference in effectiveness under two conditions is statistically significant ...Parametric tests are inappropriate because we do not know the form of the underlying distribution ...On the face of it non parametric tests might provide the answer ...

108

If the summations instead of being over A and A are now made over A [[intersection]]Bi and A [[intersection]]Bi where Bi is the set of retrieved documents on the i th iteration,then we have a query formulation which is optimal for Bi a subset of the document collection ...where wi and w 2 are weighting coefficients ...Experiments have shown that relevance feedback can be very effective ...Finally,a few comments about the technique of relevance feedback in general ...Bibliographic remarks The book by Lancaster and Fayen [16]has written an interesting survey article about on line searching ...

Concepts

Similar pages