Concepts and similar pages to Page 55

Page 55 Concepts and similar pages

Concepts

Similarity

Concept

Single link algorithm

Connection matrix

Dissimilarity coefficient

Dissimilarity matrix

Single link

Single Pass algorithm

Similarity matrix generation

Uniqueness of single link

Trees

Rocchio algorithm

the hierarchy one can identify a set of classes,and as one moves up the hierarchy the classes at the lower levels are nested in the classes at the higher levels ...It is now a simple matter to define single link in terms of these graphs;at any level a single link cluster is precisely the set of vertices of a connected component of the graph at that level ...corresponding clusters at those levels ...

second tree is quite different from the first,the nodes instead of representing clusters represent the individual objects to be clustered ...The MST contains more information than the single link hierarchy and only indirectly information about the single link clusters ...The representation of the single link hierarchy through an MST has proved very useful in connecting single link with other clustering techniques [51]...Implication of classification methods It is fairly difficult to talk about the implementation of anautomatic classification method without at the same time referring tothe file

132

calculated more efficiently based on than one based on the full EMIM ...as a measure of association ...2 ...There are numerous published algorithms for generating an MST from pairwise association measures,the most efficient probably being the recent one due to Whitney [21]...It is along these lines that Bentley and Friedman [22]have shown that by exploiting the geometry of the space in which the index terms are points the computation time for generating the MST can be shown to be almost always 0 n log n ...One major inefficiency in generating the MST is of course due to the fact that all n n 1 2 associations are computed whereas only a small number are in fact significant in the sense that they are non zero and could therefore be chosen for a weight of an edge in the spanning tree ...

differences in the scale and in the use to which a classification structure is to be put ...In the case of scale,the size of the problem in IR is invariably such that for cluster methods based on similarity matrices it becomes impossible to store the entire similarity matrix,let alone allow random access to its elements ...When a classification is to be used in IR,it affects the design of the algorithm to the extent that a classification will be represented by a file structure which is 1 easily updated;2 easily searched;and 3 reasonably compact ...Only 3 needs some further comment ...Conclusion Let me briefly summarise the logical structure of this chapter ...This chapter ended on a rather practical note ...

Concepts

Similar pages