Concept: Normalised symmetric difference

Normalised symmetric difference

Similar concepts

Similarity

Concept

Generality

Measurement of effectiveness

Measures of effectiveness

E measure

Document representative

Effectiveness

Document clustering

Maximally linked document

Typical document

Rank recall

Pages with this concept

Similarity

Page

Snapshot

175

To facilitate interpretation of the function,we transform according to [[alpha]]1 ß2 1,and find that [[partialdiff]]E [[partialdiff]]R [[partialdiff]]E [[partialdiff]]P when P R ß...E now gives rise to the following special cases:1 When [[alpha]]1 2 ß1 E A [[Delta]]B A B,a normalised symmetric difference between sets A and B A [[Delta]]B A [[union]]B A [[intersection]]B ...2 E >1 R when [[alpha]]>0 ß>,which corresponds to a user who attaches no important to precision ...3 E >1 P when [[alpha]]>1 ß>0,which corresponds to a user who attaches no importance to recall ...It is now a simple matter to show that certain other measures given in the literature are special cases of the general form E ...which is the measure recommended by Heine [3]...One final example is the measure suggested by Vickery in 1965 which was documented by Cleverdon et al ...which is Vickery s measure apart from a scale factor of 100 ...To summarise,we have shown that it is reasonable to assume thateffectiveness in terms of precision and recall determines an additive

167

the possible ordering of this set is ignored ...Now,an intuitive way of measuring the adequacy of the retrieved set is to measure the size of the shaded area ...which is a simple composite measure ...The preceding argument in itself is not sufficient to justify the use of this particular composite measure ...

pertain to documents,such as index tags,being careful of course to deal with the same number of index tags for each document ...I now return to the promised mathematical definition of dissimilarity ...If P is the set of objects to be clustered,a pairwise dissimilarity coefficient D is a function from P x P to the non negative real numbers ...D 1 D X,Y >0 for all X,Y [[propersubset]]P D 2 D X,X 0 for all X [[propersubset]]P D 3 D X,Y D Y,X for all X,Y [[propersubset]]P Informally,a dissimilarity coefficient is a kind of distance function ...D 4 D X,Y <D X,Z D Y,Z which may be recognised as the theorem from Euclidean geometry which states that the sum of the lengths of two sides of a triangle is always greater than the length of the third side ...An example of a dissimilarity coefficient satisfying D 1 D 4 is where X [[Delta]]Y X [[union]]Y X [[intersection]]Y is the symmetric different of sets X and Y ...and is monotone with respect to Jaccard s coefficient subtracted from 1 ...

176

conjoint structure ...The analysis is not limited to the two factors precision and recall,it could equally well be carried out for say the pair fallout and recall ...Presentation of experimental results In my discussion of micro,macro evaluation,and expected search length,various ways of averaging the effectiveness measure of the set of queries arose in a natural way ...In this section the discussion will be restricted to single number measures such as a normalised symmetric difference,normalised recall,etc ...The measurements we have therefore are Za Q 1,Za Q 2,...

166

differ considerably from those which the user feels are pertinent Senko [21]...Fourthly,whereas Cooper has gone to some trouble to take account of the random element introduced by ties in the matching function,it is largely ignored in the derivation of Pnorm and Rnorm ...One further comment of interest is that Robertson 15 has shown that normalised recall has an interpretation as the area under the Recall Fallout curve used by Swets ...Finally mention should be made of two similar but simpler measures used by the SMART system ...and do not take into account the collection size N,n is here the number of relevant documents for the particular test query ...A normalised symmetric difference Let us now return to basics and consider how it is that users could simply measure retrieval effectiveness ...

164

The SMART measures In 1966,Rocchio gave a derivation of two overall indices of merit based on recall and precision ...The first of these indices is normalised recall ...Normalised recall Rnorm is the area between the actual case and the worst as a proportion of the area between the best and the worst ...see Salton [23],page 285 ...A convenient explicit form of normalised recall is:where N is the number of documents in the system and N n the area between the best and the worst case to see this substitute ri N i 1 in the formula for Ab Aa ...

163

normalising the ESL by a factor proportional to the expected number of non relevant documents collected for each relevant one ...which has been called the expected search length reduction factor by Cooper ...where 1 R is the total number of documents in the collection relevant to q;2 I is the total number of documents in the collection non relevant to q;3 S is the total desired number of documents relevant to q ...The explicit form for ESL was given before ...which is known as the mean expected search length reduction factor ...Within the framework as stated at the head of this section this final measure meets the bill admirably ...For a further defence of its subjective nature see Cooper [1]...