Concepts and similar pages to Page 96

Page 96 Concepts and similar pages

Concepts

Similarity

Concept

Index term

Generality

Term

Data retrieval systems

Index term weighting

Indexing

Probabilistic retrieval

Document clustering

Document representative

Relational structure

search ...Interactive search formulation A user confronted with an automatic retrieval system is unlikely to be able to express his information need in one go ...1 the frequency of occurrence in the data base of his search terms;2 the number of documents likely to be retrieved by his query;3 alternative and related terms to be the ones used in his search;4 a small sample of the citations likely to be retrieved;and 5 the terms used to index the citations in 4 ...All this can be conveniently provided to a user during his search session by an interactive retrieval system ...The sample of citations and their indexing will give him some idea of what kind of documents are likely to be retrieved and thus some idea of how effective his search terms have been in expressing his information need ...Examples,both operational and experimental,of systems providing mechanisms of this kind are MEDLINE [11]...We now look at a mathematical approach to the use of feedback where the system automatically modifies the query ...Feedback The word feedback is normally used to describe the mechanism by whicha system can improve its performance on a task by taking

Its main advantages are:1 it is easy to implement;2 it provides fast access to the next record using lexicographic order ...Its disadvantages:1 it is difficult to update inserting a new record may require moving a large proportion of the file;2 random access is extremely slow ...Sometimes a file is considered to be sequentially organised despite the fact that it is not ordered according to any key ...Inverted files The importance of this file structure will become more apparent when Boolean Searches are discussed in the next chapter ...An inverted file is a file structure in which every list contains only one record ...Index sequential files An index sequential file is an inverted file in which for every keyword Ki,we have ni hi 1 and a 11 <a 21 ...

So far we have assumed that each key was equally likely as a search argument ...At this point it is probably a good idea to point out that these efficiency considerations are largely irrelevant when it comes to representing a document classification by a tree structure ...1 we do not have a useful linear ordering on the documents;2 a search request normally does not seek the absence or presence of a document ...In fact,what we do have is that documents are more or less similar to each other,and a request seeks documents which in some way best match the request ...This is not to say that the above efficiency considerations are unimportant in the general context of IR ...The discussion so far has been limited to binary trees ...Finally,more comments are in order about the manipulation of tree structures in mass storage devices ...

186

behaviour of any one of the components depends in only an aggregate way on the behaviour of the other components ...2 ...On the file structure chosen and the way it is used depends the efficiency of an information retrieval system ...Inverted files have been rather popular in IR systems ...There are many more problems in this area which are of interest to IR systems ...3 ...So far fairly simple search strategies have been tried ...

Five SEARCH STRATEGIES Introduction So far very little has been said about the actual process by which the required information is located ...All search strategies are based on comparison between the query and the stored documents ...The distinctions made between different kinds of search strategies can sometimes be understood by looking at the query language,that is the language in which the information need is expressed ...Boolean search A Boolean search strategy retrieves those documents which are true

108

If the summations instead of being over A and A are now made over A [[intersection]]Bi and A [[intersection]]Bi where Bi is the set of retrieved documents on the i th iteration,then we have a query formulation which is optimal for Bi a subset of the document collection ...where wi and w 2 are weighting coefficients ...Experiments have shown that relevance feedback can be very effective ...Finally,a few comments about the technique of relevance feedback in general ...Bibliographic remarks The book by Lancaster and Fayen [16]has written an interesting survey article about on line searching ...

Concepts

Similar pages