Similar concepts
Pages with this concept
Similarity |
Page |
Snapshot |
| 17 |
Generating document representatives conflation Ultimately one would like to develop a text processing system which by menas of computable methods with the minimum of human intervention will generate from the input text full text,abstract,or title a document representative adequate for use in an automatic retrieval system
...Such a system will usually consist of three parts:1 removal of high frequency words,2 suffix stripping,3 detecting equivalent stems
...The removal of high frequency words,stop words or fluff words is one way of implementing Luhn s upper cut off
...Table 2
...The second stage,suffix stripping,is more complicated
...Table 2
...1 the length of remaining stem exceeds a given number;the default is usually 2;2 the stem ending satisfies a certain condition,e
...Many words,which are equivalent in the above sense,map to one morphological form by removing their suffixes
... |
| 3 |
that in IR we are searching for relevant documents as opposed to exactly matching items
...Many automatic information retrieval systems are experimental
...Many of the techniques I shall discuss will not have proved themselves incontrovertibly superior to all other techniques,but they have promise and their promise will only be realised when they are understood
...My aim throughout has been to give a complete coverage of the more important ideas current in various special areas of information retrieval
... |
| 86 |
So far we have assumed that each key was equally likely as a search argument
...At this point it is probably a good idea to point out that these efficiency considerations are largely irrelevant when it comes to representing a document classification by a tree structure
...1 we do not have a useful linear ordering on the documents;2 a search request normally does not seek the absence or presence of a document
...In fact,what we do have is that documents are more or less similar to each other,and a request seeks documents which in some way best match the request
...This is not to say that the above efficiency considerations are unimportant in the general context of IR
...The discussion so far has been limited to binary trees
...Finally,more comments are in order about the manipulation of tree structures in mass storage devices
... |
|
|