Concepts and similar pages to Page 186

Page 186 Concepts and similar pages

Concepts

Similarity

Concept

File structures

Search strategies

Data structures

Updating classifications

File organisation

Information retrieval system

Information retrieval definition

Operational information retrieval

Data retrieval systems

Cluster based retrieval

Sparck Jones has carried on this work using measures of association between keywords based on their frequency of co occurrence that is,the frequency with which any two keywords occur together in the same document ...The term information structure for want of better words covers specifically a logical organisation of information,such as document representatives,for the purpose of information retrieval ...The organisation of these files is produced by an automatic classification method ...Evaluation of retrieval systems has proved extremely difficult ...

185

approaching this problem of speeding up clustering is to look for what one might call almost classifications ...A big question,that has not yet received much attention,concerns the extent to which retrieval effectiveness is limited by the type of document description used ...Document classification is a special case of a more general process which would also attempt to exploit relationships between documents ...An argument parallel to the one in the last paragraph could be given for automatic keyword classification,which in the more general context might be called automatic content unit classification ...H ...

structure representing it inside the computer ...Just as in many other computational problems,it is possible to trade core storage and computation time ...One important decision to be made in any retrieval system concerns the organisation of storage ...Another good example of the difference in approach between experimental and operational implementations of a classification is in the permanence of the cluster representatives ...Probably one of the most important features of a classification implementation is that it should be able to deal with a changing and growing document collection ...Although many classification algorithms claim this feature,the claim is almost invariably not met ...These comments tend to apply to the n log n classification methods ...

Finally,let me recommend two very readable discussions on hashing,one is in Page and Wilson [33]...Clustered files It is now common practice to refer to a file processed by a clustering algorithm as a clustered file,and to refer to the resulting structure as a file structure ...Bibliographic remarks There is now a vast literature on file structures although there are very few survey articles ...A general article on data structures of a more philosophical nature well worth reading is Mealey [32]...

packing the nodes of the tree on the disk given the access characteristics of the disk ...The work on data bases has been very much concerned with a concept called data independence ...There is a school of thought that says that says that applications in library automation and information retrieval should follow this path as well [6,7]...Nevertheless,it is worth taking seriously the trend away from user knowledge of file structures,a trend that has been stimulated considerably by attempts to construct a theory of data [8,9],which has become known as the relational model ...A second approach is the hierarchical approach ...The third approach is the network approach associated with the proposals by the Data Base Task Group of CODASYL ...

109

retrieval ...Anew classic paper on the limitations of a Boolean search is Verhoeff et al ...References 1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...

The structure of the book The introduction presents some basic background material,demarcates the subject and discusses loosely some of the problems in IR ...The two major chapters are those dealing with automatic classification and evaluation ...Outline Chapter 2:Automatic Text Analysis contains a straightforward discussion of how the text of a document is represented inside a computer ...Chapter 3:Automatic Classification looks at automatic classification methods in general and then takes a deeper look at the use of these methods in information retrieval ...Chapter 4:File Structures here we try and discuss file structures from the point of view of someone primarily interested in information retrieval ...Chapter 5:Search Strategies gives an account of some search strategies when applied to document collections structured in different ways ...Chapter 6:Probabilistic Retrieval describes a formal model for enhancing retrieval effectiveness by using sample information about the

have been clustered as clustered files ...Some of the work that has been largely ignored in this chapter,but which is nevertheless of importance when considering the implementation of a file structure,is concerned directly with the physical organisation of a storage device in terms of block sizes,etc ...References 1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...

The main difficulty with this kind of search strategy is the specification of the threshold or cut off ...Cluster representatives Before we can sensibly talk about search strategies applied to clustered document collections,we need to say a little about the methods used to represent clusters ...A cluster representative should be such that an incoming query will be diagnosed into the cluster containing the documents relevant to the query ...Let me first give an example of a very primitive cluster representative ...

Concepts

Similar pages