Similar concepts
Pages with this concept
Similarity |
Page |
Snapshot |
| 131 |
When computing I xi,xj for the purpose of constructing an MST we need only to know the rank ordering of the I xi,xj s
...then I xi,xj will be strictly monotone with This is an extremely simple formulation of EMIM and easy to compute
...The problem of what to do with zero entries in one of the cells 1 to 4 is taken care of by letting 0 log 0 0
...Next we discuss the possibility of approximation
...d xi,xj P xi 1,xj 1 P xi 1 P xj 1 to measure the deviation from independence for any two index terms i and j
... |
| 123 |
probability function P x,and of course a better approximation than the one afforded by making assumption A 1
...The goodness of the approximation is measured by a well known function see,for example,Kullback [12];if P x and Pa x are two discrete probability distributions then That this is indeed the case is shown by Ku and Kullback [11]...is a measure of the extent to which P a x approximates P x
...If the extent to which two index terms i and j deviate from independence is measured by the expected mutual information measure EMIM see Chapter 3,p 41
...then the best approximation Pt x,in the sense of minimising I P,Pt,is given by the maximum spanning tree MST see Chapter 3,p
...is a maximum
...One way of looking at the MST is that it incorporates the most significant of the dependences between the variables subject to the global constraint that the sum of them should be a maximum
... |
| 139 |
I must emphasise that the above argument leading to the hypothesis is not a proof
...One consequence of the discrimination hypothesis is that it provides a rationale for ranking the index terms connected to a query term in the dependence tree in order of I term,query term values to reflect the order of discrimination power values
...Bibliographic remarks The basis background reading for this chapter is contained in but a few papers
... |
|
41 |
keyword is indicated by a zero or one in the i th position respectively
...where summation is over the total number of different keywords in the document collection
...Salton considered document representatives as binary vectors embedded in an n dimensional Euclidean space,where n is the total number of index terms
...can then be interpreted as the cosine of the angular separation of the two binary vectors X and Y
...where X,Y is the inner product and
...X x 1,
...we get Some authors have attempted to base a measure of association on a probabilistic model [18]
...When xi and xj are independent P xi P xj P xi,xj and so I xi,xj 0
... |
|
|