Similarity |
Page |
Snapshot |
| 25 |
collection
...I am arguing that in using distributional information about index terms to provide,say,index term weighting we are really attacking the old problem of controlling exhaustivity and specificity
...These terms are defined in the introduction on page 10
...If we go back to Luhn s original ideas,we remember that he postulated a varying discrimination power for index terms as a function of the rank order of their frequency of occurrence,the highest discrimination power being associated with the middle frequencies
...Attempts have been made to apply weighting based on the way the index terms are distributed in the entire collection
...The difference between the last mode of weighting and the previous one may be summarised by saying that document frequency weighting places emphasis on content description whereas weighting by specificity attempts to emphasise the ability of terms to discriminate one document from another
...Salton and Yang [24]have recently attempted to combine both methods of weighting by looking at both inter document frequencies |
| 8 |
The process may involve structuring the information in some appropriate way,such as classifying it
...Finally,we come to the output,which is usually a set of citations or document numbers
...IR in perspective This section is not meant to constitute an attempt at an exhaustive and complete account of the historical development of IR
...Since the emphasis in this book is on a particular approach to document representation,I shall restrict myself here to a few remarks about its history
...At this point,it may be convenient to elaborate on the use of keyword
...The use of statistical information about distributions of words in documents was further exploited by Maron and Kuhns [11]who obtained statistical associations between keywords
... |
| 26 |
and intra document frequencies
...Salton and his co workers have developed an interesting tool for describing whether an index is good or bad
... |
| 15 |
linguistics in information science
...The chapter therefore starts with the original ideas of Luhn on which much of automatic text analysis has been built,and then goes on to describe a concrete way of generating document representatives
...Luhn s ideas In one of Luhn s [6]early papers he states:It is here proposed that the frequency of word occurrence in an article furnishes a useful measurement of word significance
...I think this quote fairly summaries Luhn s contribution to automatic text analysis
...Let f be the frequency of occurrence of various word types in a given position of text and r their rank order,that is,the order of their frequency of occurrence,then a plot relating f and r yields a curve similar to the hyperbolic curve in Figure 2
... |
| 16 |
upper and a lower see Figure 2
...It is interesting that these ideas are really basic to much of the later work in IR
...There is no reason why such an analysis should be restricted to just words
... |