Similarity |
Page |
Snapshot |
| 57 |
second tree is quite different from the first,the nodes instead of representing clusters represent the individual objects to be clustered
...The MST contains more information than the single link hierarchy and only indirectly information about the single link clusters
...The representation of the single link hierarchy through an MST has proved very useful in connecting single link with other clustering techniques [51]...Implication of classification methods It is fairly difficult to talk about the implementation of anautomatic classification method without at the same time referring tothe file |
| 132 |
calculated more efficiently based on than one based on the full EMIM
...as a measure of association
...2
...There are numerous published algorithms for generating an MST from pairwise association measures,the most efficient probably being the recent one due to Whitney [21]...It is along these lines that Bentley and Friedman [22]have shown that by exploiting the geometry of the space in which the index terms are points the computation time for generating the MST can be shown to be almost always 0 n log n
...One major inefficiency in generating the MST is of course due to the fact that all n n 1 2 associations are computed whereas only a small number are in fact significant in the sense that they are non zero and could therefore be chosen for a weight of an edge in the spanning tree
... |
| 123 |
probability function P x,and of course a better approximation than the one afforded by making assumption A 1
...The goodness of the approximation is measured by a well known function see,for example,Kullback [12];if P x and Pa x are two discrete probability distributions then That this is indeed the case is shown by Ku and Kullback [11]...is a measure of the extent to which P a x approximates P x
...If the extent to which two index terms i and j deviate from independence is measured by the expected mutual information measure EMIM see Chapter 3,p 41
...then the best approximation Pt x,in the sense of minimising I P,Pt,is given by the maximum spanning tree MST see Chapter 3,p
...is a maximum
...One way of looking at the MST is that it incorporates the most significant of the dependences between the variables subject to the global constraint that the sum of them should be a maximum
... |
| 56 |
The appropriateness of stratified hierarchic cluster methods There are many other hierarchic cluster methods,to name but a few:complete link,average link,etc
...Stratified systems of clusters are appropriate because the level of a cluster can be used in retrieval strategies as a parameter analogous to rank position or matching function threshold in a linear search
...Given that hierarchic methods are appropriate for document clustering the question arises:Which method?The answer is that under certain conditions made precise in Jardine and Sibson [2]the only acceptable stratified hierarchic cluster method is single link
...See introduction for definition
...Single link and the minimum spanning tree The single link tree such as the one shown in Figure 3
... |
| 187 |
Probabilistic search strategies have not been investigated much either,although such strategies have been tried with some effect in the fields of pattern recognition and automatic medical diagnosis
...In Chapter 5 I mentioned that bottom up search strategies are apparently more successful than The work described in Chapter 6 goes some way to remedying this situation
...the more traditional top down searches
...spanning tree on the documents could be an effective structure for guiding a search for relevant documents
...4
...The three areas of research discussed so far could fruitfully be explored through a simulation model
...One major open problem is the simulation of relevance
...5
...This has been the most troublesome area in IR
... |
| 139 |
I must emphasise that the above argument leading to the hypothesis is not a proof
...One consequence of the discrimination hypothesis is that it provides a rationale for ranking the index terms connected to a query term in the dependence tree in order of I term,query term values to reflect the order of discrimination power values
...Bibliographic remarks The basis background reading for this chapter is contained in but a few papers
... |
| 142 |
14
...15
...16
...17
...18
...19
...20
...21
...22
...23
...24
...25
...26
...27
...28
...29
...30
...31
...32
...33
...34
...35
...36
...37
... |
| 64 |
38
...39
...40
...41
...42
...43
...44
...45
...46
...47
...48
...49
...50
...51
...52
...53
...54
...55
...56
...57
...58
... |