Concepts and similar pages to Page 133

Page 133 Concepts and similar pages

Concepts

Similarity

Concept

Retrieval effectiveness

Experimental information retrieval

Operational information retrieval

Index term

Information structure

Information retrieval definition

Cluster based retrieval

Term

Relevance

Effectiveness

objected to on the same grounds that one might object to the probability of Newton s Second Law of Motion being the case ...To approach the problem in this way would be useless unless one believed that for many index terms the distribution over the relevant documents is different from that over the non relevant documents ...The elaboration in terms of ranking rather than just discrimination is trivial:the cut off set by the constant in g x is gradually relaxed thereby increasing the number of documents retrieved or assigned to the relevant category ...If one is prepared to let the user set the cut off after retrieval has taken place then the need for a theory about cut off disappears ...

137

the different contributions made to the measure by the different cells ...Discrimination gain hypothesis In the derivation above I have made the assumption of independence or dependence in a straightforward way ...P xi,xj P xi,xj w 1 P w 1 P xi,xi w 2 P w 2 P xi P xj [P xi w 1 P w 1 P xi,w 2 P w 2][P xj w 1 P w 1 P xj,w 2 P w 2]If we assume conditional independence on both w 1 and w 2 then P xi,xj P xi,w 1 P xj,w 1 P w 1 P xi w 2 P xj w 2 P w 2 For unconditional independence as well,we must have P xi,xj P xi P xj This will only happen when P w 1 0 or P w 2 0,or P xi w 1 P xi w 2,or P xj w 1 P xj w 2,or in words,when at least one of the index terms is useless at discriminating relevant from non relevant documents ...Kendall and Stuart [26]define a partial correlation coefficient for any two distributions by

123

probability function P x,and of course a better approximation than the one afforded by making assumption A 1 ...The goodness of the approximation is measured by a well known function see,for example,Kullback [12];if P x and Pa x are two discrete probability distributions then That this is indeed the case is shown by Ku and Kullback [11]...is a measure of the extent to which P a x approximates P x ...If the extent to which two index terms i and j deviate from independence is measured by the expected mutual information measure EMIM see Chapter 3,p 41 ...then the best approximation Pt x,in the sense of minimising I P,Pt,is given by the maximum spanning tree MST see Chapter 3,p ...is a maximum ...One way of looking at the MST is that it incorporates the most significant of the dependences between the variables subject to the global constraint that the sum of them should be a maximum ...

134

which from a computational point of view would simplify things enormously ...An alternative way of using the dependence tree Association Hypothesis Some of the arguments advanced in the previous section can be construed as implying that the only dependence tree we have enough information to construct is the one on the entire document collection ...The basic idea underlying term clustering was explained in Chapter 2 ...If an index term is good at discriminating relevant from non relevantdocuments then any closely associated index term is also likely to begood at this ...

119

and The importance of writing it this way,apart from its simplicity,is that for each document x to calculate g x we simply add the coefficients ci for those index terms that are present,i ...The constant C which has been assumed the same for all documents x will of course vary from query to query,but it can be interpreted as the cut off applied to the retrieval function ...Let us now turn to the other part of g x,namely ci and let us try and interpret it in terms of the conventional contingency table ...There will be one such table for each index term;I have shown it for the index term i although the subscript i has not been used in the cells ...This is in fact the weighting formula F 4 used by Robertson and Sparck Jones 1 in their so called retrospective experiments ...

129

we work with the ratio In the latter case we do not see the retrieval problem as one of discriminating between relevant and non relevant documents,instead we merely wish to compute the P relevance x for each document x and present the user with documents in decreasing order of this probability ...The decision rules derived above are couched in terms of P x wi ...I will now proceed to discuss ways of using this probabilistic model of retrieval and at the same time discuss some of the practical problems that arise ...The curse of dimensionality In deriving the decision rules I assumed that a document is represented by an n dimensional vector where n is the size of the index term vocabulary ...

114

the system to its user will be the best that is obtainable on the basis of those data ...Of course this principle raises many questions as to the acceptability of the assumptions ...The probability ranking principle assumes that we can calculate P relevance document,not only that,it assumes that we can do it accurately ...So returning now to the immediate problem which is to calculate,or estimate,P relevance document ...

115

Basic probabilistic model Since we are assuming that each document is described by the presence absence of index terms any document can be represented by a binary vector,x x 1,x 2,...where xi 0 or 1 indicates absence or presence of the ith index term ...w 1 document is relevant w 2 document is non relevant ...The theory that follows is at first rather abstract,the reader is asked to bear with it,since we soon return to the nuts and bolts of retrieval ...So,in terms of these symbols,what we wish to calculate for each document is P w 1 x and perhaps P w 2 x so that we may decide which is relevant and which is non relevant ...Here P wi is the prior probability of relevance i 1 or non relevance i 2,P x wi is proportional to what is commonly known as the likelihood of relevance or non relevance given x;in the continuous case this would be a density function and we would write p x wi ...which is the probability of observing x on a random basis given that it may be either relevant or non relevant ...

140

derives from the work of Yu and his collaborators [28,29]...According to Doyle [32]p ...The model in this chapter also connects with two other ideas in earlier research ...or in words,for any document the probability of relevance is inversely proportional the probability with which it will occur on a random basis ...

Concepts

Similar pages