Concept: Probability approximation

Probability approximation

Similar concepts

Similarity

Concept

Information measure

Probabilistic retrieval

Theory of measurement

Information retrieval definition

Operational information retrieval

Measures of association

Pages with this concept

Similarity

Page

Snapshot

123

probability function P x,and of course a better approximation than the one afforded by making assumption A 1 ...The goodness of the approximation is measured by a well known function see,for example,Kullback [12];if P x and Pa x are two discrete probability distributions then That this is indeed the case is shown by Ku and Kullback [11]...is a measure of the extent to which P a x approximates P x ...If the extent to which two index terms i and j deviate from independence is measured by the expected mutual information measure EMIM see Chapter 3,p 41 ...then the best approximation Pt x,in the sense of minimising I P,Pt,is given by the maximum spanning tree MST see Chapter 3,p ...is a maximum ...One way of looking at the MST is that it incorporates the most significant of the dependences between the variables subject to the global constraint that the sum of them should be a maximum ...

141

that this principle works so well is not yet clear but see Yu and Salton s recent theoretical paper [39]...The connection with term clustering was already made earlier on in the chapter ...It should be clear now that the quantitative model embodies within one theory such diverse topics as term clustering,early association analysis,document frequency weighting,and relevance weighting ...References 1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13 ...

133

3 ...It must be emphasised that in the non linear case the estimation of the parameters for g x will ideally involve a different MST for each of P x w 1 and P x w 2 ...There is a choice of how one would implement the model for g x depending on whether one is interested in setting the cut off a prior or a posteriori ...If one assumes that the cut off is set a posteriori then we can rank the documents according to P w 1 x and leave the user to decide when he has seen enough ...to calculate estimate the probability of relevance for each document x ...