Content bearing words

Similar concepts

Similarity Concept
Experimental information retrieval
Information retrieval system
Information retrieval definition
Operational information retrieval
Retrieval effectiveness
Automatic document classification
Automatic classification
Data retrieval systems
Generality
Probabilistic retrieval

Pages with this concept

Similarity Page Snapshot
27 Probabilistic indexing In the past few years,a detailed quantitative model for automatic indexing based on some statistical assumptions about the distribution of words in text has been worked out by Bookstein,Swanson,and Harter [29,30,31]...In their model they consider the difference in the distributional behaviour of words as a guide to whether a word should be assigned as an index term ...In general the parameter x will vary from word to word,and for a given word should be proportional to the length of the text ...The Bookstein Swanson Harter model assumes that specialty words are content bearing whereas function words are not ...
29 subsets differing in the extent to which they are about a word w then the distribution of w can be described by a mixture of two Poisson distributions ...here p 1 is the probability of a random document belonging to one of the subsets and x 1 and x 2 are the mean occurrences in the two classes ...Although Harter [31]uses function in his wording of this assumption,I think measure would have been more appropriate ...assumption 1 we can calculate the probability of relevance for any document from one of these classes ...that is used to make the decision whether to assign an index term w that occurs k times in a document ...Finally,although tests have shown that this model assigns sensible index terms,it has not been tested from the point of view of its effectiveness in retrieval ...Discrimination and or representation There are two conflicting ways of looking at the problem of characterising documents for retrieval ...