Automatic indexing

Similar concepts

Similarity Concept
Information retrieval system
Information retrieval definition
Operational information retrieval
Automatic document classification
Experimental information retrieval
Automatic classification
Data retrieval systems
Probabilistic retrieval
Index term

Pages with this concept

Similarity Page Snapshot
32 If we think of a simple retrieval strategy as operating by matching on the descriptors,whether they be keyword names or class names,then expanding representatives in either of these ways will have the effect of increasing the number of matches between document and query,and hence tends to improve recall ...Recall is defined in the introduction ...Jones [41]has reported a large number of experiments using automatic keyword classifications and found that in general one obtained a better retrieval performance with the aid of automatic keyword classification than with the unclassified keywords alone ...Unfortunately,even here the evidence has not been conclusive ...The discussion of keyword classifications has by necessity been rather sketchy ...Normalisation It is probably useful at this stage to recapitulate and show how a number of levels of normalisation of text is involved in generating document representatives ...Index term weighting can also be thought of as a process of normalisation,if the weighting scheme takes into account the number of different index terms per document ...
29 subsets differing in the extent to which they are about a word w then the distribution of w can be described by a mixture of two Poisson distributions ...here p 1 is the probability of a random document belonging to one of the subsets and x 1 and x 2 are the mean occurrences in the two classes ...Although Harter [31]uses function in his wording of this assumption,I think measure would have been more appropriate ...assumption 1 we can calculate the probability of relevance for any document from one of these classes ...that is used to make the decision whether to assign an index term w that occurs k times in a document ...Finally,although tests have shown that this model assigns sensible index terms,it has not been tested from the point of view of its effectiveness in retrieval ...Discrimination and or representation There are two conflicting ways of looking at the problem of characterising documents for retrieval ...
23 searching ...One last distinction,the vocabulary of an index language may be controlled or uncontrolled ...The index language which comes out of the conflation algorithm in the previous section may be described as uncontrolled,post coordinate and derived ...There is much controversy about the kind of index language which is best for document retrieval ...Probably the most substantial evidence for automatic indexing has come out of the SMART Project 1966 ...The document representatives used by the SMART project are more sophisticated than just the lists of stems extracted by conflation ...