| 115 |
Basic probabilistic model Since we are assuming that each document is described by the presence absence of index terms any document can be represented by a binary vector,x x 1,x 2,...where xi 0 or 1 indicates absence or presence of the ith index term
...w 1 document is relevant w 2 document is non relevant
...The theory that follows is at first rather abstract,the reader is asked to bear with it,since we soon return to the nuts and bolts of retrieval
...So,in terms of these symbols,what we wish to calculate for each document is P w 1 x and perhaps P w 2 x so that we may decide which is relevant and which is non relevant
...Here P wi is the prior probability of relevance i 1 or non relevance i 2,P x wi is proportional to what is commonly known as the likelihood of relevance or non relevance given x;in the continuous case this would be a density function and we would write p x wi
...which is the probability of observing x on a random basis given that it may be either relevant or non relevant
... |