Page 60 Concepts and similar pages

Concepts

Similarity Concept
Similarity matrix generation
Document clustering
Automatic document classification
Automatic classification
Generality
Association Measures
Cluster methods
Classification methods
Document representative
E measure

Similar pages

Similarity Page Snapshot
58 structure representing it inside the computer ...Just as in many other computational problems,it is possible to trade core storage and computation time ...One important decision to be made in any retrieval system concerns the organisation of storage ...Another good example of the difference in approach between experimental and operational implementations of a classification is in the permanence of the cluster representatives ...Probably one of the most important features of a classification implementation is that it should be able to deal with a changing and growing document collection ...Although many classification algorithms claim this feature,the claim is almost invariably not met ...These comments tend to apply to the n log n classification methods ...
59 comparison is between where n 1 <n 2 <...In any case,if one is willing to forego some of the theoretical adequacy conditions then it is possible to modify the n A HREF REF ...Another comment to be made about n log n methods is that although they have this time dependence in theory,examination of a number of the algorithms implementing them shows that they actually have an n 2 dependence e ...In experiments where we are often dealing with only a few thousand documents,we may find that the proportionality constant in the n log n method is so large that the actual time taken for clustering is greater than that for an n 2 method ...The implementation of classification algorithms for use in IR is by necessity different from implementations in other fields such as for example numerical taxonomy ...
90 Finally,let me recommend two very readable discussions on hashing,one is in Page and Wilson [33]...Clustered files It is now common practice to refer to a file processed by a clustering algorithm as a clustered file,and to refer to the resulting structure as a file structure ...Bibliographic remarks There is now a vast literature on file structures although there are very few survey articles ...A general article on data structures of a more philosophical nature well worth reading is Mealey [32]...
4 The structure of the book The introduction presents some basic background material,demarcates the subject and discusses loosely some of the problems in IR ...The two major chapters are those dealing with automatic classification and evaluation ...Outline Chapter 2:Automatic Text Analysis contains a straightforward discussion of how the text of a document is represented inside a computer ...Chapter 3:Automatic Classification looks at automatic classification methods in general and then takes a deeper look at the use of these methods in information retrieval ...Chapter 4:File Structures here we try and discuss file structures from the point of view of someone primarily interested in information retrieval ...Chapter 5:Search Strategies gives an account of some search strategies when applied to document collections structured in different ways ...Chapter 6:Probabilistic Retrieval describes a formal model for enhancing retrieval effectiveness by using sample information about the