Page 98 Concepts and similar pages

Concepts

Similarity Concept
Logical relevance
Serial search
Cosine correlation
Document representative
Data retrieval systems
Relevance
Probabilistic retrieval
Document clustering
Relevance weight
Document frequency weighting

Similar pages

Similarity Page Snapshot
106 account of past performance ...Consider now a retrieval strategy that has been implemented by means of a matching function M ...It is the aim of every retrieval strategy to retrieve the relevant documents A and withhold the non relevant documents A ...the decision procedure M Q,D T >0 corresponds to a linear discriminant function used to linearly separate two sets A and A in R [t]...M Q 0,D >T whenever D [[propersubset]]A and M Q 0,D <T whenever D [[propersubset]][[Alpha]]The interesting thing is that starting with any Q we can adjust it iteratively using feedback information so that it will converge to Q 0 ...
119 and The importance of writing it this way,apart from its simplicity,is that for each document x to calculate g x we simply add the coefficients ci for those index terms that are present,i ...The constant C which has been assumed the same for all documents x will of course vary from query to query,but it can be interpreted as the cut off applied to the retrieval function ...Let us now turn to the other part of g x,namely ci and let us try and interpret it in terms of the conventional contingency table ...There will be one such table for each index term;I have shown it for the index term i although the subscript i has not been used in the cells ...This is in fact the weighting formula F 4 used by Robertson and Sparck Jones 1 in their so called retrospective experiments ...
160 system so that if we were to adopt [[Delta]]as a measure of effectiveness we could be throwing away vital information needed to make an extrapolation to the performance of other systems ...The Cooper model expected search length In 1968,Cooper [20]stated:The primary function of a retrieval system is conceived to be that of saving its users to as great an extent as is possible,the labour of perusing and discarding irrelevant documents,in their search for relevant ones ...a only one relevant document is wanted;b some arbitrary number n is wanted;c all relevant documents are wanted;4 a given proportion of the relevant documents is wanted,etc ...Thus,the index is a measure of performance for a query of given type ...The output of a search strategy is assumed to be a weak ordering of documents ...
97 then to satisfy the K 1 AND K 2 part we intersect the K 1 and K 2 lists,to satisfy the K 3 AND NOT K 4 part we subtract the K 4 list from the K 3 list ...A slight modification of the full Boolean search is one which only allows AND logic but takes account of the actual number of terms the query has in common with a document ...For the same example as before with the query Q K 1 AND K 2 AND K 3 we obtain the following ranking:Co ordination level 3 D 1,D 2 2 D 3 1 D 4 In fact,simple matching may be viewed as using a primitive matching function ...Matching functions Many of the more sophisticated search strategies are implemented by means of a matching function ...There are many examples of matching functions in the literature ...If M is the matching function,D the set of keywords representing the document,and Q the set representing the query,then:
105 search ...Interactive search formulation A user confronted with an automatic retrieval system is unlikely to be able to express his information need in one go ...1 the frequency of occurrence in the data base of his search terms;2 the number of documents likely to be retrieved by his query;3 alternative and related terms to be the ones used in his search;4 a small sample of the citations likely to be retrieved;and 5 the terms used to index the citations in 4 ...All this can be conveniently provided to a user during his search session by an interactive retrieval system ...The sample of citations and their indexing will give him some idea of what kind of documents are likely to be retrieved and thus some idea of how effective his search terms have been in expressing his information need ...Examples,both operational and experimental,of systems providing mechanisms of this kind are MEDLINE [11]...We now look at a mathematical approach to the use of feedback where the system automatically modifies the query ...Feedback The word feedback is normally used to describe the mechanism by whicha system can improve its performance on a task by taking
112 of presenting the basic theory;I have chosen to present it in such a way that connections with other fields such as pattern recognition are easily made ...The fundamental mathematical tool for this chapter is Bayes Theorem:most of the equations derive directly from it ...This was recognised by Maron in his The Logic Behind a Probabilistic Interpretation as early as 1964 [4]...Remember that the basic instrument we have for trying to separate the relevant from the non relevant documents is a matching function,whether it be that we are in a clustered environment or an unstructured one ...It will be assumed in the sequel that the documents are described by binary state attributes,that is,absence or presence of index terms ...Estimation or calculation of relevance When we search a document collection,we attempt to retrieve relevant documents without retrieving non relevant ones ...
108 If the summations instead of being over A and A are now made over A [[intersection]]Bi and A [[intersection]]Bi where Bi is the set of retrieved documents on the i th iteration,then we have a query formulation which is optimal for Bi a subset of the document collection ...where wi and w 2 are weighting coefficients ...Experiments have shown that relevance feedback can be very effective ...Finally,a few comments about the technique of relevance feedback in general ...Bibliographic remarks The book by Lancaster and Fayen [16]has written an interesting survey article about on line searching ...