Concepts and similar pages to Page 79

Page 79 Concepts and similar pages

Concepts

Similarity

Concept

Information structure

Automatic document classification

Information content

Keyword classification

Document clustering

Document representative

Search strategies

Typical document

Maximally linked document

Information measure

is given in Figure 4 ...img src Fig ...A modification of the implementation of a list structure like this which makes it resemble a set of ring structures is to make the right hand pointer of the last element of a sublist point back to the head of the sublist ...One disadvantage associated with the use of list and ring structures for representing classifications is that they can only be entered at the top ...Another modification of the simple list representation has been studied extensively by Stanfel [21,22]and Patt A HREF REF ...

Five SEARCH STRATEGIES Introduction So far very little has been said about the actual process by which the required information is located ...All search strategies are based on comparison between the query and the stored documents ...The distinctions made between different kinds of search strategies can sometimes be understood by looking at the query language,that is the language in which the information need is expressed ...Boolean search A Boolean search strategy retrieves those documents which are true

Let us suppose that a set of documents D l,D 2,D 3,D 4,D 5,D 6,D 7,D 8 has been classified into four groups,that is D l,D 2,D 3,D 4,D 5,D 6,D 7,D 8 Furthermore these have themselves been classified into two groups,D l,D 2,D 3,D 4,D 5,D 6,D 7,D 8 The dendrogram for this structure would be that given in Figure 4 ...The Di indicates a description representation of a document ...

The structure of the book The introduction presents some basic background material,demarcates the subject and discusses loosely some of the problems in IR ...The two major chapters are those dealing with automatic classification and evaluation ...Outline Chapter 2:Automatic Text Analysis contains a straightforward discussion of how the text of a document is represented inside a computer ...Chapter 3:Automatic Classification looks at automatic classification methods in general and then takes a deeper look at the use of these methods in information retrieval ...Chapter 4:File Structures here we try and discuss file structures from the point of view of someone primarily interested in information retrieval ...Chapter 5:Search Strategies gives an account of some search strategies when applied to document collections structured in different ways ...Chapter 6:Probabilistic Retrieval describes a formal model for enhancing retrieval effectiveness by using sample information about the

The process may involve structuring the information in some appropriate way,such as classifying it ...Finally,we come to the output,which is usually a set of citations or document numbers ...IR in perspective This section is not meant to constitute an attempt at an exhaustive and complete account of the historical development of IR ...Since the emphasis in this book is on a particular approach to document representation,I shall restrict myself here to a few remarks about its history ...At this point,it may be convenient to elaborate on the use of keyword ...The use of statistical information about distributions of words in documents was further exploited by Maron and Kuhns [11]who obtained statistical associations between keywords ...

187

Probabilistic search strategies have not been investigated much either,although such strategies have been tried with some effect in the fields of pattern recognition and automatic medical diagnosis ...In Chapter 5 I mentioned that bottom up search strategies are apparently more successful than The work described in Chapter 6 goes some way to remedying this situation ...the more traditional top down searches ...spanning tree on the documents could be an effective structure for guiding a search for relevant documents ...4 ...The three areas of research discussed so far could fruitfully be explored through a simulation model ...One major open problem is the simulation of relevance ...5 ...This has been the most troublesome area in IR ...

If we think of a simple retrieval strategy as operating by matching on the descriptors,whether they be keyword names or class names,then expanding representatives in either of these ways will have the effect of increasing the number of matches between document and query,and hence tends to improve recall ...Recall is defined in the introduction ...Jones [41]has reported a large number of experiments using automatic keyword classifications and found that in general one obtained a better retrieval performance with the aid of automatic keyword classification than with the unclassified keywords alone ...Unfortunately,even here the evidence has not been conclusive ...The discussion of keyword classifications has by necessity been rather sketchy ...Normalisation It is probably useful at this stage to recapitulate and show how a number of levels of normalisation of text is involved in generating document representatives ...Index term weighting can also be thought of as a process of normalisation,if the weighting scheme takes into account the number of different index terms per document ...

186

behaviour of any one of the components depends in only an aggregate way on the behaviour of the other components ...2 ...On the file structure chosen and the way it is used depends the efficiency of an information retrieval system ...Inverted files have been rather popular in IR systems ...There are many more problems in this area which are of interest to IR systems ...3 ...So far fairly simple search strategies have been tried ...

Concepts

Similar pages