Concept: Combination function

Combination function

Similar concepts

Similarity

Concept

Document frequency weighting

Relevance

Cluster based retrieval

Clustering

Relevance weight

Pages with this concept

Similarity

Page

Snapshot

174

one unit of precision for an increase of one unit of recall,but will not sacrifice another unit of precision for a further unit increase in recall,i ...R 1,P 1 >R,P but R 1,P >R 2,P 1 We conclude that the interval between R 1 and R exceeds the interval between P and P 1 whereas the interval between R 1 and R 2 is smaller ...Finally,we incorporate into our measurement procedure the fact that users may attach different relative importance to precision and recall ...Definition 6 ...Can we find a function satisfying all these conditions?If so,can we also interpret it in an intuitively simple way?The answer to both these questions is yes ...The scale functions are therefore,[[Phi]]1 P [[alpha]]1 P,and [[Phi]]2 R 1 [[alpha]]1 R ...We now have the effectiveness measure ...

119

and The importance of writing it this way,apart from its simplicity,is that for each document x to calculate g x we simply add the coefficients ci for those index terms that are present,i ...The constant C which has been assumed the same for all documents x will of course vary from query to query,but it can be interpreted as the cut off applied to the retrieval function ...Let us now turn to the other part of g x,namely ci and let us try and interpret it in terms of the conventional contingency table ...There will be one such table for each index term;I have shown it for the index term i although the subscript i has not been used in the cells ...This is in fact the weighting formula F 4 used by Robertson and Sparck Jones 1 in their so called retrospective experiments ...

collection ...I am arguing that in using distributional information about index terms to provide,say,index term weighting we are really attacking the old problem of controlling exhaustivity and specificity ...These terms are defined in the introduction on page 10 ...If we go back to Luhn s original ideas,we remember that he postulated a varying discrimination power for index terms as a function of the rank order of their frequency of occurrence,the highest discrimination power being associated with the middle frequencies ...Attempts have been made to apply weighting based on the way the index terms are distributed in the entire collection ...The difference between the last mode of weighting and the previous one may be summarised by saying that document frequency weighting places emphasis on content description whereas weighting by specificity attempts to emphasise the ability of terms to discriminate one document from another ...Salton and Yang [24]have recently attempted to combine both methods of weighting by looking at both inter document frequencies

subsets differing in the extent to which they are about a word w then the distribution of w can be described by a mixture of two Poisson distributions ...here p 1 is the probability of a random document belonging to one of the subsets and x 1 and x 2 are the mean occurrences in the two classes ...Although Harter [31]uses function in his wording of this assumption,I think measure would have been more appropriate ...assumption 1 we can calculate the probability of relevance for any document from one of these classes ...that is used to make the decision whether to assign an index term w that occurs k times in a document ...Finally,although tests have shown that this model assigns sensible index terms,it has not been tested from the point of view of its effectiveness in retrieval ...Discrimination and or representation There are two conflicting ways of looking at the problem of characterising documents for retrieval ...

117

D 1 and D 2 can be shown to be equivalent under certain conditions ...[P x w 1 P w 1 >P x w 2 P w 2 >x is relevant,x is non relevant]D 3 Notice that P x has disappeared from the equation since it does not affect the outcome of the decision ...[R w 1 x <R w 2 x][[equivalence]][l 21 l 11 P x w 1 P w 1 >l 12 l 22 P x w 2 P w 2]When a special loss function is chosen,namely,which implies that no loss is assigned to a correct decision quite reasonable and unit loss to any error not so reasonable,then we have [R w 1 x <R w 2 x [[equivalence]]P x w 1 P w 1 >P x w 2 P w 2]which shows the equivalence of D 2 and D 3,and hence of D 1 and D 2 under a binary loss function ...This completes the derivation of the decision rule to be used to decide relevance or non relevance,or to put it differently to retrieve or not to retrieve ...Form of retrieval function The previous section was rather abstract and left the connection of the various probabilities with IR rather open ...

125

document x for different settings of a pair of variables xi,xj i ...and similarly for the other three settings of xi and xj i ...This shows how simple the non linear weighting function really is ...Estimation of parameters The use of a weighting function of the kind derived above in actual retrieval requires the estimation of pertinent parameters ...Here I have adopted a labelling scheme for the cells in which [x]means the number of occurrences in the cell labelled x ...

118

Theorem is the best way of getting at it ...P x wi P x 1 wi P x 2 wi ...Later I shall show how this stringent assumption may be relaxed ...Let us now take the simplified form of P x wi and work out what the decision rule will look like ...pi Prob xi 1 w 1 qi Prob xi 1 w 2 ...In words pi qi is the probability that if the document is relevant non relevant that the i th index term will be present ...To appreciate how these expressions work,the reader should check that P 0,1,1,0,0,1 w 1 1 p 1 p 2 p 3 1 p 4 1 p 5 p 6 ...where the constants ai,bi and e are obvious ...

and intra document frequencies ...Salton and his co workers have developed an interesting tool for describing whether an index is good or bad ...

126

In general we would have two tables of this kind when setting up our function g x,one for estimating the parameters associated with P x w 1 and one for P x w 2 ...The estimates shown above are examples of point estimates ...Two basic assumptions made in deriving any estimation rule through Bayesian decision theory are:1 the form of the prior distribution on the parameter space,i ...probability distribution on the possible values of the binomial parameter;and 2 the form of the loss function used to measure the error made in estimating the parameter ...Once these two assumptions are made explicit by defining the form of the distribution and loss function,then,together with Bayes Principle which seeks to minimise the posterior conditional expected loss given the observations,we can derive a number of different estimation rules ...where x is the number of successes in n trials,and a and b are parameters dictated by the particularcombination of prior and loss

115

Basic probabilistic model Since we are assuming that each document is described by the presence absence of index terms any document can be represented by a binary vector,x x 1,x 2,...where xi 0 or 1 indicates absence or presence of the ith index term ...w 1 document is relevant w 2 document is non relevant ...The theory that follows is at first rather abstract,the reader is asked to bear with it,since we soon return to the nuts and bolts of retrieval ...So,in terms of these symbols,what we wish to calculate for each document is P w 1 x and perhaps P w 2 x so that we may decide which is relevant and which is non relevant ...Here P wi is the prior probability of relevance i 1 or non relevance i 2,P x wi is proportional to what is commonly known as the likelihood of relevance or non relevance given x;in the continuous case this would be a density function and we would write p x wi ...which is the probability of observing x on a random basis given that it may be either relevant or non relevant ...