Similarity |
Page |
Snapshot |
| 138 |
where [[rho]]...[[rho]]X,Y W 0 which implies using the expression for the partial correlation that [[rho]]X,Y [[rho]]X,W [[rho]]Y,W Since [[rho]]X,Y <1,[[rho]]X,W <1,[[rho]]Y,W <1 this in turn implies that under the hypothesis of conditional independence [[rho]]X,Y <[[rho]]X,W or [[rho]]Y,W Hence if W is a random variable representing relevance then thecorrelation between it and either index term is greater than the correlation between the index terms
...Qualitatively I shall try and generalise this to functions other than correlation coefficients,Linfott [27]defines a type of informational correlation measure by rij 1 exp 2 I xi,xj [1 2]0 <rij <1 or where I xi,xj is the now familiar expected mutual information measure
...I xi,xj <I xi,W or I xj,W,where I
...Discrimination Gain Hypothesis:Under the hypothesis ofconditional independence the statistical information contained in oneindex term about another is less than the information contained ineither index term about relevance
... |
| 120 |
convenience let us set There are a number of ways of looking at Ki
...Typically the weight Ki N,r,n,R is estimated from a contingency table in which N is not the total number of documents in the system but instead is some subset specifically chosen to enable Ki to be estimated
...The index terms are not independent Although it may be mathematically convenient to assume that the index terms are independent it by no means follows that it is realistic to do so
... |
| 119 |
and The importance of writing it this way,apart from its simplicity,is that for each document x to calculate g x we simply add the coefficients ci for those index terms that are present,i
...The constant C which has been assumed the same for all documents x will of course vary from query to query,but it can be interpreted as the cut off applied to the retrieval function
...Let us now turn to the other part of g x,namely ci and let us try and interpret it in terms of the conventional contingency table
...There will be one such table for each index term;I have shown it for the index term i although the subscript i has not been used in the cells
...This is in fact the weighting formula F 4 used by Robertson and Sparck Jones 1 in their so called retrospective experiments
... |
| 25 |
collection
...I am arguing that in using distributional information about index terms to provide,say,index term weighting we are really attacking the old problem of controlling exhaustivity and specificity
...These terms are defined in the introduction on page 10
...If we go back to Luhn s original ideas,we remember that he postulated a varying discrimination power for index terms as a function of the rank order of their frequency of occurrence,the highest discrimination power being associated with the middle frequencies
...Attempts have been made to apply weighting based on the way the index terms are distributed in the entire collection
...The difference between the last mode of weighting and the previous one may be summarised by saying that document frequency weighting places emphasis on content description whereas weighting by specificity attempts to emphasise the ability of terms to discriminate one document from another
...Salton and Yang [24]have recently attempted to combine both methods of weighting by looking at both inter document frequencies |
| 135 |
The way we interpret this hypothesis is that a term in the query used by a user is likely to be there because it is a good discriminator and hence we are interested in its close associates
...Discrimination power of an index term On p
...and in fact there made the comment that it was a measure of the power of term i to discriminate between relevant and non relevant documents
...Instead of Ki I suggest using the information radius,defined in Chapter 3 on p
... |
| 140 |
derives from the work of Yu and his collaborators [28,29]...According to Doyle [32]p
...The model in this chapter also connects with two other ideas in earlier research
...or in words,for any document the probability of relevance is inversely proportional the probability with which it will occur on a random basis
... |
| 128 |
objected to on the same grounds that one might object to the probability of Newton s Second Law of Motion being the case
...To approach the problem in this way would be useless unless one believed that for many index terms the distribution over the relevant documents is different from that over the non relevant documents
...The elaboration in terms of ranking rather than just discrimination is trivial:the cut off set by the constant in g x is gradually relaxed thereby increasing the number of documents retrieved or assigned to the relevant category
...If one is prepared to let the user set the cut off after retrieval has taken place then the need for a theory about cut off disappears
... |
| 133 |
3
...It must be emphasised that in the non linear case the estimation of the parameters for g x will ideally involve a different MST for each of P x w 1 and P x w 2
...There is a choice of how one would implement the model for g x depending on whether one is interested in setting the cut off a prior or a posteriori
...If one assumes that the cut off is set a posteriori then we can rank the documents according to P w 1 x and leave the user to decide when he has seen enough
...to calculate estimate the probability of relevance for each document x
... |
| 134 |
which from a computational point of view would simplify things enormously
...An alternative way of using the dependence tree Association Hypothesis Some of the arguments advanced in the previous section can be construed as implying that the only dependence tree we have enough information to construct is the one on the entire document collection
...The basic idea underlying term clustering was explained in Chapter 2
...If an index term is good at discriminating relevant from non relevantdocuments then any closely associated index term is also likely to begood at this
... |