Concepts
Similar pages
Similarity |
Page |
Snapshot |
| 42 |
nice property of being invariant under one to one transformations of the co ordinates
...A function very similar to the expected mutual information measure was suggested by Jardine and Sibson [2]specifically to measure dissimilarity between two classes of objects
...Here u and v are positive weights adding to unit
...P x P x w 1 P w 1 P x w 2 P w 2 x 0,1 P x wi P x wi P x i 1,2 we recover the expected mutual information measure I x,wi
... |
| 133 |
3
...It must be emphasised that in the non linear case the estimation of the parameters for g x will ideally involve a different MST for each of P x w 1 and P x w 2
...There is a choice of how one would implement the model for g x depending on whether one is interested in setting the cut off a prior or a posteriori
...If one assumes that the cut off is set a posteriori then we can rank the documents according to P w 1 x and leave the user to decide when he has seen enough
...to calculate estimate the probability of relevance for each document x
... |
| 134 |
which from a computational point of view would simplify things enormously
...An alternative way of using the dependence tree Association Hypothesis Some of the arguments advanced in the previous section can be construed as implying that the only dependence tree we have enough information to construct is the one on the entire document collection
...The basic idea underlying term clustering was explained in Chapter 2
...If an index term is good at discriminating relevant from non relevantdocuments then any closely associated index term is also likely to begood at this
... |
| 137 |
the different contributions made to the measure by the different cells
...Discrimination gain hypothesis In the derivation above I have made the assumption of independence or dependence in a straightforward way
...P xi,xj P xi,xj w 1 P w 1 P xi,xi w 2 P w 2 P xi P xj [P xi w 1 P w 1 P xi,w 2 P w 2][P xj w 1 P w 1 P xj,w 2 P w 2]If we assume conditional independence on both w 1 and w 2 then P xi,xj P xi,w 1 P xj,w 1 P w 1 P xi w 2 P xj w 2 P w 2 For unconditional independence as well,we must have P xi,xj P xi P xj This will only happen when P w 1 0 or P w 2 0,or P xi w 1 P xi w 2,or P xj w 1 P xj w 2,or in words,when at least one of the index terms is useless at discriminating relevant from non relevant documents
...Kendall and Stuart [26]define a partial correlation coefficient for any two distributions by |
| 120 |
convenience let us set There are a number of ways of looking at Ki
...Typically the weight Ki N,r,n,R is estimated from a contingency table in which N is not the total number of documents in the system but instead is some subset specifically chosen to enable Ki to be estimated
...The index terms are not independent Although it may be mathematically convenient to assume that the index terms are independent it by no means follows that it is realistic to do so
... |
| 140 |
derives from the work of Yu and his collaborators [28,29]...According to Doyle [32]p
...The model in this chapter also connects with two other ideas in earlier research
...or in words,for any document the probability of relevance is inversely proportional the probability with which it will occur on a random basis
... |
| 41 |
keyword is indicated by a zero or one in the i th position respectively
...where summation is over the total number of different keywords in the document collection
...Salton considered document representatives as binary vectors embedded in an n dimensional Euclidean space,where n is the total number of index terms
...can then be interpreted as the cosine of the angular separation of the two binary vectors X and Y
...where X,Y is the inner product and
...X x 1,...we get Some authors have attempted to base a measure of association on a probabilistic model [18]...When xi and xj are independent P xi P xj P xi,xj and so I xi,xj 0
... |
| 124 |
example,in Figure 6
...I x 1,x 2 I x 2,x 3 I x 2,x 4 I x 2,x 5 I x 5 x 6 is a maximum
...Once the dependence tree has been found the approximating distribution can be written down immediately in the form A 2
...ti Prob xi 1 xj i 1 ri Prob xi 1 x j i 0 and r 1 Prob x 1 1 P xi xj i [ti [xi]1 ti [1][xi]][xj i []ri [xi]1 ri [1][xi]][1][xj i]then This is a non linear weighting function which will simplify to the one derived from A 1 when the variables are assumed to be independent,that is,when ti ri
...g x log P x w 1 log P x w 2 which now involves the calculation or estimation of twice as many parameters as in the linear case
...It is easier to see how g x combines differentweights for different terms if one looks at the weights contributed to g x for a given |
| 118 |
Theorem is the best way of getting at it
...P x wi P x 1 wi P x 2 wi
...Later I shall show how this stringent assumption may be relaxed
...Let us now take the simplified form of P x wi and work out what the decision rule will look like
...pi Prob xi 1 w 1 qi Prob xi 1 w 2
...In words pi qi is the probability that if the document is relevant non relevant that the i th index term will be present
...To appreciate how these expressions work,the reader should check that P 0,1,1,0,0,1 w 1 1 p 1 p 2 p 3 1 p 4 1 p 5 p 6
...where the constants ai,bi and e are obvious
... |
|
|