Similar concepts
Pages with this concept
Similarity |
Page |
Snapshot |
| 41 |
keyword is indicated by a zero or one in the i th position respectively
...where summation is over the total number of different keywords in the document collection
...Salton considered document representatives as binary vectors embedded in an n dimensional Euclidean space,where n is the total number of index terms
...can then be interpreted as the cosine of the angular separation of the two binary vectors X and Y
...where X,Y is the inner product and
...X x 1,...we get Some authors have attempted to base a measure of association on a probabilistic model [18]...When xi and xj are independent P xi P xj P xi,xj and so I xi,xj 0
... |
| 39 |
There are five commonly used measures of association in information retrieval
...The simplest of all association measures is X [[intersection]]Y Simple matching coefficient which is the number of shared index terms
...These may all be considered to be normalised versions of the simple matching coefficient
...then X 1 1 Y 1 1 X 1 [[intersection]]Y 2 1 >S 1 1 S 2 1 X 2 10 Y 2 10 X 2 [[intersection]]Y 2 1 >S 1 1 S 2 1 10 S 1 X 1,Y 1 S 1 X 2,Y 2 which is clearly absurd since X 1 and Y 1 are identical representatives whereas X 2 and Y 2 are radically different
...Doyle [17]hinted at the importance of normalisation in an amusing way:One would regard the postulate All documents are created equal as being a reasonable foundation for a library description
... |
|
|