Page 17 Concepts and similar pages

Concepts

Similarity Concept
Document representative
Low frequency words
Stop list
Query representative
Suffixes
Rare words
Conflation
Stemming
Context sensitive suffix removal
Suffix stripping

Similar pages

Similarity Page Snapshot
22 entry in the list defining B and PT as equivalent stem endings if the preceding characters match ...The assumption in the context of IR is that if two words have the same underlying stem then they refer to the same concept and should be indexed as such ...It is inevitable that a processing system such as this will produce errors ...My description of the three stages has been deliberately undetailed,only the underlying mechanism has been explained ...Surprisingly,this kind of algorithm is not core limited but limited instead by its processing time ...The final output from a conflation algorithm is a set of classes,one for each stem detected ...Queries are of course treated in the same way ...Indexing An index language is the language used to describe documents and requests ...