Concept: EMIM

EMIM

Similar concepts

Similarity

Concept

Discrimination power

Dependence tree

Cluster time dependence

Association Hypothesis

First order tree dependence

Term dependence

Conditional independence

Estimation of parameters

Dependence stochastic

Decision rule

Pages with this concept

Similarity

Page

Snapshot

131

When computing I xi,xj for the purpose of constructing an MST we need only to know the rank ordering of the I xi,xj s ...then I xi,xj will be strictly monotone with This is an extremely simple formulation of EMIM and easy to compute ...The problem of what to do with zero entries in one of the cells 1 to 4 is taken care of by letting 0 log 0 0 ...Next we discuss the possibility of approximation ...d xi,xj P xi 1,xj 1 P xi 1 P xj 1 to measure the deviation from independence for any two index terms i and j ...

123

probability function P x,and of course a better approximation than the one afforded by making assumption A 1 ...The goodness of the approximation is measured by a well known function see,for example,Kullback [12];if P x and Pa x are two discrete probability distributions then That this is indeed the case is shown by Ku and Kullback [11]...is a measure of the extent to which P a x approximates P x ...If the extent to which two index terms i and j deviate from independence is measured by the expected mutual information measure EMIM see Chapter 3,p 41 ...then the best approximation Pt x,in the sense of minimising I P,Pt,is given by the maximum spanning tree MST see Chapter 3,p ...is a maximum ...One way of looking at the MST is that it incorporates the most significant of the dependences between the variables subject to the global constraint that the sum of them should be a maximum ...

139

I must emphasise that the above argument leading to the hypothesis is not a proof ...One consequence of the discrimination hypothesis is that it provides a rationale for ranking the index terms connected to a query term in the dependence tree in order of I term,query term values to reflect the order of discrimination power values ...Bibliographic remarks The basis background reading for this chapter is contained in but a few papers ...

keyword is indicated by a zero or one in the i th position respectively ...where summation is over the total number of different keywords in the document collection ...Salton considered document representatives as binary vectors embedded in an n dimensional Euclidean space,where n is the total number of index terms ...can then be interpreted as the cosine of the angular separation of the two binary vectors X and Y ...where X,Y is the inner product and ...X x 1, ...we get Some authors have attempted to base a measure of association on a probabilistic model [18] ...When xi and xj are independent P xi P xj P xi,xj and so I xi,xj 0 ...