Independence measurements

Similar concepts

Similarity Concept
Experimental information retrieval
Information retrieval system
Information retrieval definition
Operational information retrieval
Data retrieval systems
Retrieval effectiveness
Automatic document classification
Cluster based retrieval
Information measure
Document clustering

Pages with this concept

Similarity Page Snapshot
169 The model We start by examining the structure which it is reasonable to assume for the measurement of effectiveness ...If R is the set of possible recall values and P is the set of possible precision values then we are interested in the set R x P with a relation on it ...Definition 1 ...1 Connectedness:either e 1 >e 2 or e 2 >e 1 2 Transitivity:if e 1 >e 2 and e 2 >e 3 then e 1 >e 3 We insist that if two pairs can be ordered both ways then R 1,P 1 R 2,P 2,i ...We now turn to a second condition which is commonly called independence ...Definition 2 ...All we are saying here is,given that at a constant recall precision we find a difference in effectiveness for two values of precision recall then this difference cannot be removed or reversed by changing the constant value ...We now come to a condition which is not quite as obvious as the preceding ones ...
123 probability function P x,and of course a better approximation than the one afforded by making assumption A 1 ...The goodness of the approximation is measured by a well known function see,for example,Kullback [12];if P x and Pa x are two discrete probability distributions then That this is indeed the case is shown by Ku and Kullback [11]...is a measure of the extent to which P a x approximates P x ...If the extent to which two index terms i and j deviate from independence is measured by the expected mutual information measure EMIM see Chapter 3,p 41 ...then the best approximation Pt x,in the sense of minimising I P,Pt,is given by the maximum spanning tree MST see Chapter 3,p ...is a maximum ...One way of looking at the MST is that it incorporates the most significant of the dependences between the variables subject to the global constraint that the sum of them should be a maximum ...
41 keyword is indicated by a zero or one in the i th position respectively ...where summation is over the total number of different keywords in the document collection ...Salton considered document representatives as binary vectors embedded in an n dimensional Euclidean space,where n is the total number of index terms ...can then be interpreted as the cosine of the angular separation of the two binary vectors X and Y ...where X,Y is the inner product and ...X x 1,...we get Some authors have attempted to base a measure of association on a probabilistic model [18]...When xi and xj are independent P xi P xj P xi,xj and so I xi,xj 0 ...