Similarity |
Page |
Snapshot |
| 107 |
exists there is an iterative procedure which will ensure that Q will converge to Q 0 in a finite number of steps
...The iterative procedure is called the fixed increment error correction procedure
...It goes as follows:Qi Qi 1 cD if M Qi 1,D T <0 and D [[propersubset]]A Qi Qi 1 cD if M Qi 1,D T >0 and D [[propersubset]]A and no change made to Qi 1 if it diagnoses correctly
...The situation in actual retrieval is not as simple
...Once again this is not the whole story
...If M is taken to be the cosine function Q,D Q D then it is easy to show that [[Phi]]is maximised by where c is an arbitrary proportionality constant
... |
| 6 |
language input and storage more feasible
...The reader will have noticed that already,the idea of relevance has slipped into the discussion
...Intellectually it is possible for a human to establish the relevance of a document to a query
...An information retrieval system Let me illustrate by means of a black box what a typical IR system would look like
...Starting with the input side of things
... |
| 108 |
If the summations instead of being over A and A are now made over A [[intersection]]Bi and A [[intersection]]Bi where Bi is the set of retrieved documents on the i th iteration,then we have a query formulation which is optimal for Bi a subset of the document collection
...where wi and w 2 are weighting coefficients
...Experiments have shown that relevance feedback can be very effective
...Finally,a few comments about the technique of relevance feedback in general
...Bibliographic remarks The book by Lancaster and Fayen [16]has written an interesting survey article about on line searching
... |
| 98 |
is another example of a matching function
...A popular one used by the SMART project,which they call cosine correlation,assumes that the document and query are represented as numerical vectors in t space,that is Q q 1,q 2,...or,in the notation for a vector space with a Euclidean norm,where [[theta]]is the angle between vectors Q and D
...Serial search Although serial searches are acknowledge to be slow,they are frequently still used as parts of larger systems
...Suppose there are N documents Di in the system,then the serial search proceeds by calculating N values M Q,Di the set of documents to be retrieved is determined
...1 the matching function is given a suitable threshold,retrieving the documents above the threshold and discarding the ones below
...2 the documents are ranked in increasing order of matching function value
... |
| 160 |
system so that if we were to adopt [[Delta]]as a measure of effectiveness we could be throwing away vital information needed to make an extrapolation to the performance of other systems
...The Cooper model expected search length In 1968,Cooper [20]stated:The primary function of a retrieval system is conceived to be that of saving its users to as great an extent as is possible,the labour of perusing and discarding irrelevant documents,in their search for relevant ones
...a only one relevant document is wanted;b some arbitrary number n is wanted;c all relevant documents are wanted;4 a given proportion of the relevant documents is wanted,etc
...Thus,the index is a measure of performance for a query of given type
...The output of a search strategy is assumed to be a weak ordering of documents
... |
| 105 |
search
...Interactive search formulation A user confronted with an automatic retrieval system is unlikely to be able to express his information need in one go
...1 the frequency of occurrence in the data base of his search terms;2 the number of documents likely to be retrieved by his query;3 alternative and related terms to be the ones used in his search;4 a small sample of the citations likely to be retrieved;and 5 the terms used to index the citations in 4
...All this can be conveniently provided to a user during his search session by an interactive retrieval system
...The sample of citations and their indexing will give him some idea of what kind of documents are likely to be retrieved and thus some idea of how effective his search terms have been in expressing his information need
...Examples,both operational and experimental,of systems providing mechanisms of this kind are MEDLINE [11]...We now look at a mathematical approach to the use of feedback where the system automatically modifies the query
...Feedback The word feedback is normally used to describe the mechanism by whicha system can improve its performance on a task by taking |
| 30 |
In practice,one seeks some sort of optimal trade off between representation and discrimination
...The emphasis on representation leads to what one might call a document orientation:that is,a total preoccupation with modelling what the document is about
...This point of view is also adopted by those concerned with defining a concept of information,they assume that once this notion is properly explicated a document can be represented by the information it contains [37]...The emphasis on discrimination leads to a query orientation
...Automatic keyword classification Many automatic retrieval systems rely on thesauri to modify queries and document representatives to improve the chance of retrieving relevant documents
... |
| 112 |
of presenting the basic theory;I have chosen to present it in such a way that connections with other fields such as pattern recognition are easily made
...The fundamental mathematical tool for this chapter is Bayes Theorem:most of the equations derive directly from it
...This was recognised by Maron in his The Logic Behind a Probabilistic Interpretation as early as 1964 [4]...Remember that the basic instrument we have for trying to separate the relevant from the non relevant documents is a matching function,whether it be that we are in a clustered environment or an unstructured one
...It will be assumed in the sequel that the documents are described by binary state attributes,that is,absence or presence of index terms
...Estimation or calculation of relevance When we search a document collection,we attempt to retrieve relevant documents without retrieving non relevant ones
... |
| 97 |
then to satisfy the K 1 AND K 2 part we intersect the K 1 and K 2 lists,to satisfy the K 3 AND NOT K 4 part we subtract the K 4 list from the K 3 list
...A slight modification of the full Boolean search is one which only allows AND logic but takes account of the actual number of terms the query has in common with a document
...For the same example as before with the query Q K 1 AND K 2 AND K 3 we obtain the following ranking:Co ordination level 3 D 1,D 2 2 D 3 1 D 4 In fact,simple matching may be viewed as using a primitive matching function
...Matching functions Many of the more sophisticated search strategies are implemented by means of a matching function
...There are many examples of matching functions in the literature
...If M is the matching function,D the set of keywords representing the document,and Q the set representing the query,then: |