Page 53 Concepts and similar pages

Concepts

Similarity Concept
Cluster representative
Single link
Cluster profile
Order dependence
Nearest neighbour classification
Document clustering
Association Measures
Cluster methods
Graph theoretic cluster methods
Document representative

Similar pages

Similarity Page Snapshot
52 have to be made of the file of object descriptions ...1 the object descriptions are processed serially;2 the first object becomes the cluster representative of the first cluster;3 each subsequent object is matched against all cluster representatives existing at its processing time;4 a given object is assigned to one cluster or more if overlap is allowed according to some condition on the matching function;5 when an object is assigned to a cluster the representative for that cluster is recomputed;6 if an object fails a certain test it becomes the cluster representative of a new cluster ...Once again the final classification is dependent on input parameters which can only be determined empirically and which are likely to be different for different sets of objects and must be specified in advance ...The simplest version of this kind of algorithm is probably one due to Hill [37]...Related to the single pass approach is the algorithm of MacQueen [41]which starts with an arbitrary initial partition of the objects ...A third type of algorithm is represented by the work of Dattola [42]...
57 second tree is quite different from the first,the nodes instead of representing clusters represent the individual objects to be clustered ...The MST contains more information than the single link hierarchy and only indirectly information about the single link clusters ...The representation of the single link hierarchy through an MST has proved very useful in connecting single link with other clustering techniques [51]...Implication of classification methods It is fairly difficult to talk about the implementation of anautomatic classification method without at the same time referring tothe file
54 the hierarchy one can identify a set of classes,and as one moves up the hierarchy the classes at the lower levels are nested in the classes at the higher levels ...It is now a simple matter to define single link in terms of these graphs;at any level a single link cluster is precisely the set of vertices of a connected component of the graph at that level ...corresponding clusters at those levels ...
48 The second criterion for choice is the efficiency of the clustering process in terms of speed and storage requirements ...Efficiency is really a property of the algorithm implementing the cluster method ...In the main,two distinct approaches to clustering can be identified:1 the clustering is based on a measure of similarity between the objects to be clustered;2 the cluster method proceeds directly from the object descriptions ...The most obvious examples of the first approach are the graph theoretic methods which define clusters in terms of a graph derived from the measure of similarity ...A string is a connected sequence of objects from some starting point ...A connected component is a set of objects such that each object is connected to at least one other member of the set and the set is maximal with respect to this property ...A maximal complete subgraph is a subgraph such that each node is connected to every other node in the subgraph and the set is maximal with respect to this property,i ...node were included anywhere the completeness condition would be violated ...A large class of hierarchic cluster methods is based on the initial measurement of similarity ...
38 Salton [9],he says:Clearly in practice it is not possible to match each analysed document with each analysed search request because the time consumed by such operation would be excessive ...Measures of association Some classification methods are based on a binary relationship between objects ...Informally speaking,a measure of association increases as the number or proportion of shared attribute states increases ...
56 The appropriateness of stratified hierarchic cluster methods There are many other hierarchic cluster methods,to name but a few:complete link,average link,etc ...Stratified systems of clusters are appropriate because the level of a cluster can be used in retrieval strategies as a parameter analogous to rank position or matching function threshold in a linear search ...Given that hierarchic methods are appropriate for document clustering the question arises:Which method?The answer is that under certain conditions made precise in Jardine and Sibson [2]the only acceptable stratified hierarchic cluster method is single link ...See introduction for definition ...Single link and the minimum spanning tree The single link tree such as the one shown in Figure 3 ...