Concept: Dissimilarity coefficient

Dissimilarity coefficient

Similar concepts

Similarity

Concept

Document clustering

Cluster methods

Document representative

Heuristic cluster methods

Clustering

Stratified hierarchic cluster methods

Maximally linked document

Cluster representative

Hierarchic cluster methods

Graph theoretic cluster methods

Pages with this concept

Similarity

Page

Snapshot

the hierarchy one can identify a set of classes,and as one moves up the hierarchy the classes at the lower levels are nested in the classes at the higher levels ...It is now a simple matter to define single link in terms of these graphs;at any level a single link cluster is precisely the set of vertices of a connected component of the graph at that level ...corresponding clusters at those levels ...

pertain to documents,such as index tags,being careful of course to deal with the same number of index tags for each document ...I now return to the promised mathematical definition of dissimilarity ...If P is the set of objects to be clustered,a pairwise dissimilarity coefficient D is a function from P x P to the non negative real numbers ...D 1 D X,Y >0 for all X,Y [[propersubset]]P D 2 D X,X 0 for all X [[propersubset]]P D 3 D X,Y D Y,X for all X,Y [[propersubset]]P Informally,a dissimilarity coefficient is a kind of distance function ...D 4 D X,Y <D X,Z D Y,Z which may be recognised as the theorem from Euclidean geometry which states that the sum of the lengths of two sides of a triangle is always greater than the length of the third side ...An example of a dissimilarity coefficient satisfying D 1 D 4 is where X [[Delta]]Y X [[union]]Y X [[intersection]]Y is the symmetric different of sets X and Y ...and is monotone with respect to Jaccard s coefficient subtracted from 1 ...

This description immediately leads to an inefficient algorithm for the generation of single link classes ...

efficiency of implementation for a particular application ...An example of an ordered classification is a hierarchy ...The discussion about classification has been purposely vague up to this point ...Let me know be more specific about current and past approaches to classification,particularly in the context of information retrieval ...The cluster hypothesis Before describing the battery of classification methods that are now used in information retrieval,I should like to discuss the underlying hypothesis for their use in document clustering ...A basic assumption in retrieval systems is that documents relevant to a request are separated from those which are not relevant,i ...a both of which are relevant to a request,and b one of which is relevant and the other non relevant ...Summing over a set of requests gives the relative distribution of