Similar concepts
Pages with this concept
Similarity |
Page |
Snapshot |
| 39 |
There are five commonly used measures of association in information retrieval
...The simplest of all association measures is X [[intersection]]Y Simple matching coefficient which is the number of shared index terms
...These may all be considered to be normalised versions of the simple matching coefficient
...then X 1 1 Y 1 1 X 1 [[intersection]]Y 2 1 >S 1 1 S 2 1 X 2 10 Y 2 10 X 2 [[intersection]]Y 2 1 >S 1 1 S 2 1 10 S 1 X 1,Y 1 S 1 X 2,Y 2 which is clearly absurd since X 1 and Y 1 are identical representatives whereas X 2 and Y 2 are radically different
...Doyle [17]hinted at the importance of normalisation in an amusing way:One would regard the postulate All documents are created equal as being a reasonable foundation for a library description
... |
| 40 |
pertain to documents,such as index tags,being careful of course to deal with the same number of index tags for each document
...I now return to the promised mathematical definition of dissimilarity
...If P is the set of objects to be clustered,a pairwise dissimilarity coefficient D is a function from P x P to the non negative real numbers
...D 1 D X,Y >0 for all X,Y [[propersubset]]P D 2 D X,X 0 for all X [[propersubset]]P D 3 D X,Y D Y,X for all X,Y [[propersubset]]P Informally,a dissimilarity coefficient is a kind of distance function
...D 4 D X,Y <D X,Z D Y,Z which may be recognised as the theorem from Euclidean geometry which states that the sum of the lengths of two sides of a triangle is always greater than the length of the third side
...An example of a dissimilarity coefficient satisfying D 1 D 4 is where X [[Delta]]Y X [[union]]Y X [[intersection]]Y is the symmetric different of sets X and Y
...and is monotone with respect to Jaccard s coefficient subtracted from 1
... |
|
|