Cosine coefficient

Similar concepts

Similarity Concept
Document clustering
Clustering
Maximally linked document
Document representative
Typical document
Matching function
Document frequency weighting
Automatic document classification
Association Measures

Pages with this concept

Similarity Page Snapshot
41 keyword is indicated by a zero or one in the i th position respectively ...where summation is over the total number of different keywords in the document collection ...Salton considered document representatives as binary vectors embedded in an n dimensional Euclidean space,where n is the total number of index terms ...can then be interpreted as the cosine of the angular separation of the two binary vectors X and Y ...where X,Y is the inner product and ...X x 1,...we get Some authors have attempted to base a measure of association on a probabilistic model [18]...When xi and xj are independent P xi P xj P xi,xj and so I xi,xj 0 ...
39 There are five commonly used measures of association in information retrieval ...The simplest of all association measures is X [[intersection]]Y Simple matching coefficient which is the number of shared index terms ...These may all be considered to be normalised versions of the simple matching coefficient ...then X 1 1 Y 1 1 X 1 [[intersection]]Y 2 1 >S 1 1 S 2 1 X 2 10 Y 2 10 X 2 [[intersection]]Y 2 1 >S 1 1 S 2 1 10 S 1 X 1,Y 1 S 1 X 2,Y 2 which is clearly absurd since X 1 and Y 1 are identical representatives whereas X 2 and Y 2 are radically different ...Doyle [17]hinted at the importance of normalisation in an amusing way:One would regard the postulate All documents are created equal as being a reasonable foundation for a library description ...