Document representative

Similar concepts

Similarity Concept
Operational information retrieval
Experimental information retrieval
Information retrieval system
Information retrieval definition
Automatic document classification
Retrieval effectiveness
Cluster based retrieval
Document clustering
Data retrieval systems

Pages with this concept

Similarity Page Snapshot
7 systems store only a representation of the document or query which means that the text of a document is lost once it has been processed for the purpose of generating its representation ...When the retrieval system is on line,it is possible for the user to change his request during one search session in the light of a sample retrieval,thereby,it is hoped,improving the subsequent retrieval run ...Secondly,the processor,that part of the retrieval system concerned with the retrieval process ...
15 linguistics in information science ...The chapter therefore starts with the original ideas of Luhn on which much of automatic text analysis has been built,and then goes on to describe a concrete way of generating document representatives ...Luhn s ideas In one of Luhn s [6]early papers he states:It is here proposed that the frequency of word occurrence in an article furnishes a useful measurement of word significance ...I think this quote fairly summaries Luhn s contribution to automatic text analysis ...Let f be the frequency of occurrence of various word types in a given position of text and r their rank order,that is,the order of their frequency of occurrence,then a plot relating f and r yields a curve similar to the hyperbolic curve in Figure 2 ...
17 Generating document representatives conflation Ultimately one would like to develop a text processing system which by menas of computable methods with the minimum of human intervention will generate from the input text full text,abstract,or title a document representative adequate for use in an automatic retrieval system ...Such a system will usually consist of three parts:1 removal of high frequency words,2 suffix stripping,3 detecting equivalent stems ...The removal of high frequency words,stop words or fluff words is one way of implementing Luhn s upper cut off ...Table 2 ...The second stage,suffix stripping,is more complicated ...Table 2 ...1 the length of remaining stem exceeds a given number;the default is usually 2;2 the stem ending satisfies a certain condition,e ...Many words,which are equivalent in the above sense,map to one morphological form by removing their suffixes ...