Concept: Probability of relevance

Probability of relevance

Similar concepts

Similarity

Concept

Experimental information retrieval

Information retrieval system

Information retrieval definition

Operational information retrieval

Retrieval effectiveness

Information measure

Data retrieval systems

Probabilistic retrieval

Relevance

Index term

Pages with this concept

Similarity

Page

Snapshot

113

any given document whether it is relevant or non relevant ...PQ relevance document where the Q is meant to emphasise that it is for a specific query ...P relevance document ...Let us now assume following Robertson [7]that:1 The relevance of a document to a request is independent of other documents in the collection ...With this assumption we can now state a principle,in terms of probability of relevance,which shows that probabilistic information can be used in an optimal manner in retrieval ...The probability ranking principle ...

128

objected to on the same grounds that one might object to the probability of Newton s Second Law of Motion being the case ...To approach the problem in this way would be useless unless one believed that for many index terms the distribution over the relevant documents is different from that over the non relevant documents ...The elaboration in terms of ranking rather than just discrimination is trivial:the cut off set by the constant in g x is gradually relaxed thereby increasing the number of documents retrieved or assigned to the relevant category ...If one is prepared to let the user set the cut off after retrieval has taken place then the need for a theory about cut off disappears ...

127

function ...One important reason for having estimation rules different from the simple x n,is that this is rather unrealistic for small samples ...0 <p <1 ...This is really as much as I wish to say about estimation rules,and therefore I shall not push the technical discussion on this points any further;the interested reader should consult the readily accessible statistical literature ...Recapitulation At this point I should like to summarise the formal argument thus far so that we may reduce it to simple English ...The first point to make then,is that,we have been trying to estimate P relevance document,that is,the probability of relevance for a given document ...

The model also assumes that a document can be about a word to some degree ...Harter [31]has identified two assumptions,based upon which the above ideas can be used to provide a method of automatic indexing ...1 The probability that a document will be found relevant to a request for information on a subject is a function of the relative extent to which the topic is treated in the document ...2 The number of tokens in a document is a function of the extent to which the subject referred to by the word is treated in the document ...In these assumptions a topic is identified with the subject of the request and with the subject referred to by the word ...