Concept: Maximum spanning tree

Maximum spanning tree

Similar concepts

Similarity

Concept

Information retrieval definition

Operational information retrieval

Data model

Experimental information retrieval

Information measure

Information content

Hierarchical data model

Information structure

Network data model

Information Radius

Pages with this concept

Similarity

Page

Snapshot

123

probability function P x,and of course a better approximation than the one afforded by making assumption A 1 ...The goodness of the approximation is measured by a well known function see,for example,Kullback [12];if P x and Pa x are two discrete probability distributions then That this is indeed the case is shown by Ku and Kullback [11]...is a measure of the extent to which P a x approximates P x ...If the extent to which two index terms i and j deviate from independence is measured by the expected mutual information measure EMIM see Chapter 3,p 41 ...then the best approximation Pt x,in the sense of minimising I P,Pt,is given by the maximum spanning tree MST see Chapter 3,p ...is a maximum ...One way of looking at the MST is that it incorporates the most significant of the dependences between the variables subject to the global constraint that the sum of them should be a maximum ...

second tree is quite different from the first,the nodes instead of representing clusters represent the individual objects to be clustered ...The MST contains more information than the single link hierarchy and only indirectly information about the single link clusters ...The representation of the single link hierarchy through an MST has proved very useful in connecting single link with other clustering techniques [51]...Implication of classification methods It is fairly difficult to talk about the implementation of anautomatic classification method without at the same time referring tothe file

132

calculated more efficiently based on than one based on the full EMIM ...as a measure of association ...2 ...There are numerous published algorithms for generating an MST from pairwise association measures,the most efficient probably being the recent one due to Whitney [21]...It is along these lines that Bentley and Friedman [22]have shown that by exploiting the geometry of the space in which the index terms are points the computation time for generating the MST can be shown to be almost always 0 n log n ...One major inefficiency in generating the MST is of course due to the fact that all n n 1 2 associations are computed whereas only a small number are in fact significant in the sense that they are non zero and could therefore be chosen for a weight of an edge in the spanning tree ...

139

I must emphasise that the above argument leading to the hypothesis is not a proof ...One consequence of the discrimination hypothesis is that it provides a rationale for ranking the index terms connected to a query term in the dependence tree in order of I term,query term values to reflect the order of discrimination power values ...Bibliographic remarks The basis background reading for this chapter is contained in but a few papers ...