Concept: Conditional independence

Conditional independence

Similar concepts

Similarity

Concept

Heuristic cluster methods

Pages with this concept

Similarity

Page

Snapshot

137

the different contributions made to the measure by the different cells ...Discrimination gain hypothesis In the derivation above I have made the assumption of independence or dependence in a straightforward way ...P xi,xj P xi,xj w 1 P w 1 P xi,xi w 2 P w 2 P xi P xj [P xi w 1 P w 1 P xi,w 2 P w 2][P xj w 1 P w 1 P xj,w 2 P w 2]If we assume conditional independence on both w 1 and w 2 then P xi,xj P xi,w 1 P xj,w 1 P w 1 P xi w 2 P xj w 2 P w 2 For unconditional independence as well,we must have P xi,xj P xi P xj This will only happen when P w 1 0 or P w 2 0,or P xi w 1 P xi w 2,or P xj w 1 P xj w 2,or in words,when at least one of the index terms is useless at discriminating relevant from non relevant documents ...Kendall and Stuart [26]define a partial correlation coefficient for any two distributions by

138

where [[rho]]...[[rho]]X,Y W 0 which implies using the expression for the partial correlation that [[rho]]X,Y [[rho]]X,W [[rho]]Y,W Since [[rho]]X,Y <1,[[rho]]X,W <1,[[rho]]Y,W <1 this in turn implies that under the hypothesis of conditional independence [[rho]]X,Y <[[rho]]X,W or [[rho]]Y,W Hence if W is a random variable representing relevance then thecorrelation between it and either index term is greater than the correlation between the index terms ...Qualitatively I shall try and generalise this to functions other than correlation coefficients,Linfott [27]defines a type of informational correlation measure by rij 1 exp 2 I xi,xj [1 2]0 <rij <1 or where I xi,xj is the now familiar expected mutual information measure ...I xi,xj <I xi,W or I xj,W,where I ...Discrimination Gain Hypothesis:Under the hypothesis ofconditional independence the statistical information contained in oneindex term about another is less than the information contained ineither index term about relevance ...

139

I must emphasise that the above argument leading to the hypothesis is not a proof ...One consequence of the discrimination hypothesis is that it provides a rationale for ranking the index terms connected to a query term in the dependence tree in order of I term,query term values to reflect the order of discrimination power values ...Bibliographic remarks The basis background reading for this chapter is contained in but a few papers ...

169

The model We start by examining the structure which it is reasonable to assume for the measurement of effectiveness ...If R is the set of possible recall values and P is the set of possible precision values then we are interested in the set R x P with a relation on it ...Definition 1 ...1 Connectedness:either e 1 >e 2 or e 2 >e 1 2 Transitivity:if e 1 >e 2 and e 2 >e 3 then e 1 >e 3 We insist that if two pairs can be ordered both ways then R 1,P 1 R 2,P 2,i ...We now turn to a second condition which is commonly called independence ...Definition 2 ...All we are saying here is,given that at a constant recall precision we find a difference in effectiveness for two values of precision recall then this difference cannot be removed or reversed by changing the constant value ...We now come to a condition which is not quite as obvious as the preceding ones ...

171

In other words we are ensuring that the equation R,P R,P is soluble for R provided that there exist R,R such that R,P >R,P >R,P ...The fifth condition is not limiting in any way but needs to be stated ...Definition 5 ...Thus we require that variation in one while leaving the other constant gives a variation in effectiveness ...Finally we need a technical condition which will not be explained here,that is the Archimedean property for each component ...We now have six conditions on the relational structure <R x P,>>which in the theory of measurement are necessary and sufficient conditions for it to be an additive conjoint structure ...In our case we can therefore expect to find real valued functions [[Phi]]1 on R and [[Phi]]2 on P and a function F from Re x Re into Re,1:1 in each variable,such that,for all R,R [[propersubset]]R and P,P [[propersubset]]P we have:R,P >R,P <>F [[[Phi]]1 R,[[Phi]]2 P]>F [[[Phi]]1 R,[[Phi]]2 P]Note that although the same symbol >is used,the first is a binary relation on R x P,the second is the usual one on Re,the set of reals ...In other words there are numerical scales [[Phi]]i on the two components and a rule F for combining them such that the resultant measure preserves the qualitative ordering of effectiveness ...

176

conjoint structure ...The analysis is not limited to the two factors precision and recall,it could equally well be carried out for say the pair fallout and recall ...Presentation of experimental results In my discussion of micro,macro evaluation,and expected search length,various ways of averaging the effectiveness measure of the set of queries arose in a natural way ...In this section the discussion will be restricted to single number measures such as a normalised symmetric difference,normalised recall,etc ...The measurements we have therefore are Za Q 1,Za Q 2,...