Similarity |
Page |
Snapshot |
| 9 |
Sparck Jones has carried on this work using measures of association between keywords based on their frequency of co occurrence that is,the frequency with which any two keywords occur together in the same document
...The term information structure for want of better words covers specifically a logical organisation of information,such as document representatives,for the purpose of information retrieval
...The organisation of these files is produced by an automatic classification method
...Evaluation of retrieval systems has proved extremely difficult
... |
| 56 |
The appropriateness of stratified hierarchic cluster methods There are many other hierarchic cluster methods,to name but a few:complete link,average link,etc
...Stratified systems of clusters are appropriate because the level of a cluster can be used in retrieval strategies as a parameter analogous to rank position or matching function threshold in a linear search
...Given that hierarchic methods are appropriate for document clustering the question arises:Which method?The answer is that under certain conditions made precise in Jardine and Sibson [2]the only acceptable stratified hierarchic cluster method is single link
...See introduction for definition
...Single link and the minimum spanning tree The single link tree such as the one shown in Figure 3
... |
| 45 |
efficiency of implementation for a particular application
...An example of an ordered classification is a hierarchy
...The discussion about classification has been purposely vague up to this point
...Let me know be more specific about current and past approaches to classification,particularly in the context of information retrieval
...The cluster hypothesis Before describing the battery of classification methods that are now used in information retrieval,I should like to discuss the underlying hypothesis for their use in document clustering
...A basic assumption in retrieval systems is that documents relevant to a request are separated from those which are not relevant,i
...a both of which are relevant to a request,and b one of which is relevant and the other non relevant
...Summing over a set of requests gives the relative distribution of |
| 37 |
influence the choice of [classification]method and the results obtained
...There are two main areas of application of classification methods in IR:1 keyword clustering;2 document clustering
...The first area is very well dealt with in a recent book by Sparck Jones [5]...Good [6]:We define the organisation as the grouping together of items e
...The efficiency of document clustering has been emphasised by |
| 46 |
relevant relevant R R and relevant non relevant R N R associations of a collection
...From these it is apparent:a that the separation for collection X is good while for Y it is poor;and b that the strength of the association between relevant documents is greater for X than for Y
...Figure 3
...It is this separation between the distributions that one attempts to exploit in document clustering
...I should add that these conclusions can only be verified,finally,by experimental work on a large number of collections
... |
| 64 |
38
...39
...40
...41
...42
...43
...44
...45
...46
...47
...48
...49
...50
...51
...52
...53
...54
...55
...56
...57
...58
... |