Page 101 Concepts and similar pages

Concepts

Similarity Concept
Maximal predictor
Document clustering
Retrieval effectiveness
Association Measures
Document representative
Measures of association
Effectiveness
E measure
Normalised association measures
Clustering

Similar pages

Similarity Page Snapshot
100 A and B are two clusters ...that their corresponding documents are less dissimilar than some specified level of dissimilarity ...Let us now look at other ways of representing clusters ...where Di is usually the Euclidean norm,i ...More often than not the documents are not represented by numerical vectors but by binary vectors or equivalently,sets of keywords ...remember n is the number of documents in the cluster by the following procedure ...
102 This can be rewritten as The expression will be minimised,thus maximising the number of correct predictions,when C c 1,...is a minimum ...So in other words a keyword will be assigned to a cluster representative if it occurs in more than half the member documents ...Although the main reason for constructing these cluster representatives is to lead a search strategy to relevant documents,it should be clear that they can also be used to guide a search to documents meeting some condition on the matching function ...Di M Q,Di >T For more details about the evaluation of cluster representative 3 for this purpose the reader should consult the work of Yu et al ...One major objection to most work on cluster representatives is that it treats the distribution of keywords in clusters as independent ...Finally,it should be noted that cluster methods which proceed directly from document descriptions to the classification without first
41 keyword is indicated by a zero or one in the i th position respectively ...where summation is over the total number of different keywords in the document collection ...Salton considered document representatives as binary vectors embedded in an n dimensional Euclidean space,where n is the total number of index terms ...can then be interpreted as the cosine of the angular separation of the two binary vectors X and Y ...where X,Y is the inner product and ...X x 1,...we get Some authors have attempted to base a measure of association on a probabilistic model [18]...When xi and xj are independent P xi P xj P xi,xj and so I xi,xj 0 ...