Concepts and similar pages to Page 4

Page 4 Concepts and similar pages

Concepts

Similarity

Concept

Automatic text analysis

Experimental information retrieval

Information retrieval definition

Operational information retrieval

Retrieval effectiveness

Automatic document classification

Two AUTOMATIC TEXT ANALYSIS Introduction Before a computerised information retrieval system can actually operate to retrieve some information,that information must have already been stored inside the computer ...The starting point of the text analysis process may be the complete document text,an abstract,the title only,or perhaps a list of words only ...The developments and advances in the process of representation have been reviewed every year by the appropriate chapters of Cuadra s Annual Review of Information Science and Technology ...

think of retrieval effectiveness in terms of precision and recall ...Bibliographic remarks The best introduction to information retrieval is probably got by reading some of the early papers in the field ...One early publication worth reading which is rather hard to come by is the report on the Cranfield II project by Cleverdon et al ...Papers on information retrieval have a tendency to get published in journals on computer science and library science ...

Finally,let me recommend two very readable discussions on hashing,one is in Page and Wilson [33]...Clustered files It is now common practice to refer to a file processed by a clustering algorithm as a clustered file,and to refer to the resulting structure as a file structure ...Bibliographic remarks There is now a vast literature on file structures although there are very few survey articles ...A general article on data structures of a more philosophical nature well worth reading is Mealey [32]...

that in IR we are searching for relevant documents as opposed to exactly matching items ...Many automatic information retrieval systems are experimental ...Many of the techniques I shall discuss will not have proved themselves incontrovertibly superior to all other techniques,but they have promise and their promise will only be realised when they are understood ...My aim throughout has been to give a complete coverage of the more important ideas current in various special areas of information retrieval ...

Sparck Jones has carried on this work using measures of association between keywords based on their frequency of co occurrence that is,the frequency with which any two keywords occur together in the same document ...The term information structure for want of better words covers specifically a logical organisation of information,such as document representatives,for the purpose of information retrieval ...The organisation of these files is produced by an automatic classification method ...Evaluation of retrieval systems has proved extremely difficult ...

185

approaching this problem of speeding up clustering is to look for what one might call almost classifications ...A big question,that has not yet received much attention,concerns the extent to which retrieval effectiveness is limited by the type of document description used ...Document classification is a special case of a more general process which would also attempt to exploit relationships between documents ...An argument parallel to the one in the last paragraph could be given for automatic keyword classification,which in the more general context might be called automatic content unit classification ...H ...

structure representing it inside the computer ...Just as in many other computational problems,it is possible to trade core storage and computation time ...One important decision to be made in any retrieval system concerns the organisation of storage ...Another good example of the difference in approach between experimental and operational implementations of a classification is in the permanence of the cluster representatives ...Probably one of the most important features of a classification implementation is that it should be able to deal with a changing and growing document collection ...Although many classification algorithms claim this feature,the claim is almost invariably not met ...These comments tend to apply to the n log n classification methods ...

186

behaviour of any one of the components depends in only an aggregate way on the behaviour of the other components ...2 ...On the file structure chosen and the way it is used depends the efficiency of an information retrieval system ...Inverted files have been rather popular in IR systems ...There are many more problems in this area which are of interest to IR systems ...3 ...So far fairly simple search strategies have been tried ...

Concepts

Similar pages