Concepts and similar pages to Page 84

Page 84 Concepts and similar pages

Concepts

Similarity

Concept

Binary tree

Terminal node

Document clustering

Document representative

Clustering

Clustered file

Cluster methods

Index sequential file

Indexing

Cluster representative

A description of the use of a sequential file in an on line environment may be found in Negus and Hall [36]...Work on tree structures in IR goes back a long way as illustrated by the early papers by Salton [43]where not only methods of construction are discussed but also techniques of reorganisation ...More recently a special kind of tree,called a trie,has attracted attention ...The use of hashing in document retrieval is dealt with in Higgins and Smith [50]...It has become fashionable to refer to document collections which

So far we have assumed that each key was equally likely as a search argument ...At this point it is probably a good idea to point out that these efficiency considerations are largely irrelevant when it comes to representing a document classification by a tree structure ...1 we do not have a useful linear ordering on the documents;2 a search request normally does not seek the absence or presence of a document ...In fact,what we do have is that documents are more or less similar to each other,and a request seeks documents which in some way best match the request ...This is not to say that the above efficiency considerations are unimportant in the general context of IR ...The discussion so far has been limited to binary trees ...Finally,more comments are in order about the manipulation of tree structures in mass storage devices ...

Unfortunately,in many applications one wants the ability to insert a key which has been found to be absent ...The structure of the tree as it grows is largely dependent on the order in which new keys are presented ...It would take us too far afield for me to explain the techniques for avoiding degenerate trees ...

104

corresponding to the maximum value of the matching function achieved within a filial set ...1 we assume that effective retrieval can be achieved by finding just one cluster;2 we assume that each cluster can be adequately represented by a cluster represent ative for the purpose of locating the cluster containing the relevant documents;3 if the maximum of the matching function is not unique some special action,such as a look ahead,will need to be taken;4 the search always terminates and will retrieve at least one document ...An immediate generalisation of this search is to allow the search to proceed down more than one branch of the tree so as to allow retrieval of more than one cluster ...The above strategies may be described as top down searches ...If we now abandon the idea of having a multi level clustering and accept a single level clustering,we end up with the approach to document clustering which Salton and his co workers have worked on extensively ...

Concepts

Similar pages