Similarity |
Page |
Snapshot |
| 55 |
This description immediately leads to an inefficient algorithm for the generation of single link classes
... |
| 60 |
differences in the scale and in the use to which a classification structure is to be put
...In the case of scale,the size of the problem in IR is invariably such that for cluster methods based on similarity matrices it becomes impossible to store the entire similarity matrix,let alone allow random access to its elements
...When a classification is to be used in IR,it affects the design of the algorithm to the extent that a classification will be represented by a file structure which is 1 easily updated;2 easily searched;and 3 reasonably compact
...Only 3 needs some further comment
...Conclusion Let me briefly summarise the logical structure of this chapter
...This chapter ended on a rather practical note
... |
| 56 |
The appropriateness of stratified hierarchic cluster methods There are many other hierarchic cluster methods,to name but a few:complete link,average link,etc
...Stratified systems of clusters are appropriate because the level of a cluster can be used in retrieval strategies as a parameter analogous to rank position or matching function threshold in a linear search
...Given that hierarchic methods are appropriate for document clustering the question arises:Which method?The answer is that under certain conditions made precise in Jardine and Sibson [2]the only acceptable stratified hierarchic cluster method is single link
...See introduction for definition
...Single link and the minimum spanning tree The single link tree such as the one shown in Figure 3
... |
| 57 |
second tree is quite different from the first,the nodes instead of representing clusters represent the individual objects to be clustered
...The MST contains more information than the single link hierarchy and only indirectly information about the single link clusters
...The representation of the single link hierarchy through an MST has proved very useful in connecting single link with other clustering techniques [51]...Implication of classification methods It is fairly difficult to talk about the implementation of anautomatic classification method without at the same time referring tothe file |
| 53 |
between the algorithms of Rocchio,Rieber and Marathe,Bonner see below and his own
...One further algorithm that should be mentioned here is that due to Litofsky [28]...Finally,the Bonner [45]algorithm should be mentioned
...The major advantage of the algorithmically defined cluster methods is their speed:order n log n where n is the number of objects to be clustered compared with order n 2 for the methods based on association measures
...One obvious omission from the list of cluster methods is the group of mathematically or statistically based methods such as Factor Analysis and Latest Class Analysis
...The method of single link avoids the disadvantages just mentioned
...Single link The dissimilarity coefficient is the basic input to a single link clustering algorithm
... |
| 54 |
the hierarchy one can identify a set of classes,and as one moves up the hierarchy the classes at the lower levels are nested in the classes at the higher levels
...It is now a simple matter to define single link in terms of these graphs;at any level a single link cluster is precisely the set of vertices of a connected component of the graph at that level
...corresponding clusters at those levels
... |