Concepts and similar pages to Page 91

Page 91 Concepts and similar pages

Concepts

Similarity

Concept

Trees

Trie

Cluster based retrieval

Document clustering

Retrieval effectiveness

Automatic document classification

Document representative

Effectiveness

Relevance

Clustering

eventually terminate at a particular node from which no further branches will emerge ...By now it is perhaps apparent that when we were talking about ring structures and threaded lists in some of our examples we were really demonstrating how to implement a tree structure ...Another example of a tree structure is the directory associated with an index sequential file ...The use of tree structures in computer science dates back to the early 1950 s when it was realised that the so called binary search could readily be represented by a binary tree ...

Finally,let me recommend two very readable discussions on hashing,one is in Page and Wilson [33]...Clustered files It is now common practice to refer to a file processed by a clustering algorithm as a clustered file,and to refer to the resulting structure as a file structure ...Bibliographic remarks There is now a vast literature on file structures although there are very few survey articles ...A general article on data structures of a more philosophical nature well worth reading is Mealey [32]...

186

behaviour of any one of the components depends in only an aggregate way on the behaviour of the other components ...2 ...On the file structure chosen and the way it is used depends the efficiency of an information retrieval system ...Inverted files have been rather popular in IR systems ...There are many more problems in this area which are of interest to IR systems ...3 ...So far fairly simple search strategies have been tried ...

34 ...35 ...36 ...37 ...38 ...39 ...40 ...41 ...42 ...43 ...44 ...45 ...46 ...47 ...48 ...49 ...50 ...51 ...52 ...53 ...54 ...55 ...56 ...

109

retrieval ...Anew classic paper on the limitations of a Boolean search is Verhoeff et al ...References 1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...

So far we have assumed that each key was equally likely as a search argument ...At this point it is probably a good idea to point out that these efficiency considerations are largely irrelevant when it comes to representing a document classification by a tree structure ...1 we do not have a useful linear ordering on the documents;2 a search request normally does not seek the absence or presence of a document ...In fact,what we do have is that documents are more or less similar to each other,and a request seeks documents which in some way best match the request ...This is not to say that the above efficiency considerations are unimportant in the general context of IR ...The discussion so far has been limited to binary trees ...Finally,more comments are in order about the manipulation of tree structures in mass storage devices ...

is no K 3 list,the field reserved for its pointer could well have been omitted ...The multi list is designed to overcome the difficulties of updating an inverted file ...Cellular multi lists A further modification of the multi list is inspired by the fact that many storage media are divided into pages,which can be retrieved one at a time ...At this point the full power of the notation introduced before comes into play ...Ki,ni,hi,ai 1,...where the hi have beenpicked to ensure that a Ki list does not cross a page boundary ...Ring structures A ring is simply a linear list that closes upon itself ...

Sparck Jones has carried on this work using measures of association between keywords based on their frequency of co occurrence that is,the frequency with which any two keywords occur together in the same document ...The term information structure for want of better words covers specifically a logical organisation of information,such as document representatives,for the purpose of information retrieval ...The organisation of these files is produced by an automatic classification method ...Evaluation of retrieval systems has proved extremely difficult ...

103

computing the intermediate dissimilarity coefficient,will need to make a choice of cluster representative ab initio ...Cluster based retrieval Cluster based retrieval has as its foundation the cluster hypothesis,which states that closely associated documents tend to be relevant to the same requests ...Suppose we have a hierarchic classification of documents then a simple search strategy goes as follows refer to Figure 5 ...

Concepts

Similar pages