| 17 |
Generating document representatives conflation Ultimately one would like to develop a text processing system which by menas of computable methods with the minimum of human intervention will generate from the input text full text,abstract,or title a document representative adequate for use in an automatic retrieval system
...Such a system will usually consist of three parts:1 removal of high frequency words,2 suffix stripping,3 detecting equivalent stems
...The removal of high frequency words,stop words or fluff words is one way of implementing Luhn s upper cut off
...Table 2
...The second stage,suffix stripping,is more complicated
...Table 2
...1 the length of remaining stem exceeds a given number;the default is usually 2;2 the stem ending satisfies a certain condition,e
...Many words,which are equivalent in the above sense,map to one morphological form by removing their suffixes
... |