Similarity |
Page |
Snapshot |
| 129 |
we work with the ratio In the latter case we do not see the retrieval problem as one of discriminating between relevant and non relevant documents,instead we merely wish to compute the P relevance x for each document x and present the user with documents in decreasing order of this probability
...The decision rules derived above are couched in terms of P x wi
...I will now proceed to discuss ways of using this probabilistic model of retrieval and at the same time discuss some of the practical problems that arise
...The curse of dimensionality In deriving the decision rules I assumed that a document is represented by an n dimensional vector where n is the size of the index term vocabulary
... |
| 118 |
Theorem is the best way of getting at it
...P x wi P x 1 wi P x 2 wi
...Later I shall show how this stringent assumption may be relaxed
...Let us now take the simplified form of P x wi and work out what the decision rule will look like
...pi Prob xi 1 w 1 qi Prob xi 1 w 2
...In words pi qi is the probability that if the document is relevant non relevant that the i th index term will be present
...To appreciate how these expressions work,the reader should check that P 0,1,1,0,0,1 w 1 1 p 1 p 2 p 3 1 p 4 1 p 5 p 6
...where the constants ai,bi and e are obvious
... |
| 42 |
nice property of being invariant under one to one transformations of the co ordinates
...A function very similar to the expected mutual information measure was suggested by Jardine and Sibson [2]specifically to measure dissimilarity between two classes of objects
...Here u and v are positive weights adding to unit
...P x P x w 1 P w 1 P x w 2 P w 2 x 0,1 P x wi P x wi P x i 1,2 we recover the expected mutual information measure I x,wi
... |
| 120 |
convenience let us set There are a number of ways of looking at Ki
...Typically the weight Ki N,r,n,R is estimated from a contingency table in which N is not the total number of documents in the system but instead is some subset specifically chosen to enable Ki to be estimated
...The index terms are not independent Although it may be mathematically convenient to assume that the index terms are independent it by no means follows that it is realistic to do so
... |
| 140 |
derives from the work of Yu and his collaborators [28,29]...According to Doyle [32]p
...The model in this chapter also connects with two other ideas in earlier research
...or in words,for any document the probability of relevance is inversely proportional the probability with which it will occur on a random basis
... |