Concept: Discrimination gain hypothesis

Discrimination gain hypothesis

Similar concepts

Similarity

Concept

Relevance

Indexing

Probability of relevance

Document clustering

Document frequency weighting

Document representative

Index term

Maximally linked document

Typical document

Inverse document frequency weighting

Pages with this concept

Similarity

Page

Snapshot

138

where [[rho]]...[[rho]]X,Y W 0 which implies using the expression for the partial correlation that [[rho]]X,Y [[rho]]X,W [[rho]]Y,W Since [[rho]]X,Y <1,[[rho]]X,W <1,[[rho]]Y,W <1 this in turn implies that under the hypothesis of conditional independence [[rho]]X,Y <[[rho]]X,W or [[rho]]Y,W Hence if W is a random variable representing relevance then thecorrelation between it and either index term is greater than the correlation between the index terms ...Qualitatively I shall try and generalise this to functions other than correlation coefficients,Linfott [27]defines a type of informational correlation measure by rij 1 exp 2 I xi,xj [1 2]0 <rij <1 or where I xi,xj is the now familiar expected mutual information measure ...I xi,xj <I xi,W or I xj,W,where I ...Discrimination Gain Hypothesis:Under the hypothesis ofconditional independence the statistical information contained in oneindex term about another is less than the information contained ineither index term about relevance ...

137

the different contributions made to the measure by the different cells ...Discrimination gain hypothesis In the derivation above I have made the assumption of independence or dependence in a straightforward way ...P xi,xj P xi,xj w 1 P w 1 P xi,xi w 2 P w 2 P xi P xj [P xi w 1 P w 1 P xi,w 2 P w 2][P xj w 1 P w 1 P xj,w 2 P w 2]If we assume conditional independence on both w 1 and w 2 then P xi,xj P xi,w 1 P xj,w 1 P w 1 P xi w 2 P xj w 2 P w 2 For unconditional independence as well,we must have P xi,xj P xi P xj This will only happen when P w 1 0 or P w 2 0,or P xi w 1 P xi w 2,or P xj w 1 P xj w 2,or in words,when at least one of the index terms is useless at discriminating relevant from non relevant documents ...Kendall and Stuart [26]define a partial correlation coefficient for any two distributions by