Concept: Automatic keyword clustering

Automatic keyword clustering

Similar concepts

Similarity

Concept

Experimental information retrieval

Information retrieval definition

Operational information retrieval

Document clustering

Automatic document classification

Cluster based retrieval

Automatic classification

Information retrieval system

Automatic indexing

Pages with this concept

Similarity

Page

Snapshot

The second criterion for choice is the efficiency of the clustering process in terms of speed and storage requirements ...Efficiency is really a property of the algorithm implementing the cluster method ...In the main,two distinct approaches to clustering can be identified:1 the clustering is based on a measure of similarity between the objects to be clustered;2 the cluster method proceeds directly from the object descriptions ...The most obvious examples of the first approach are the graph theoretic methods which define clusters in terms of a graph derived from the measure of similarity ...A string is a connected sequence of objects from some starting point ...A connected component is a set of objects such that each object is connected to at least one other member of the set and the set is maximal with respect to this property ...A maximal complete subgraph is a subgraph such that each node is connected to every other node in the subgraph and the set is maximal with respect to this property,i ...node were included anywhere the completeness condition would be violated ...A large class of hierarchic cluster methods is based on the initial measurement of similarity ...

141

that this principle works so well is not yet clear but see Yu and Salton s recent theoretical paper [39]...The connection with term clustering was already made earlier on in the chapter ...It should be clear now that the quantitative model embodies within one theory such diverse topics as term clustering,early association analysis,document frequency weighting,and relevance weighting ...References 1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13 ...

that a search strategy will infallibly find the class of documents containing the relevant documents ...Note that the Cluster Hypothesis refers to given document descriptions ...As can be seen from the above,the Cluster Hypothesis is a convenient way of expressing the aim of such operations as document clustering ...The use of clustering in information retrieval There are a number of discussions in print now which cover the use of clustering in IR ...In choosing a cluster method for use in experimental IR,two,often conflicting,criteria have frequently been used ...1 the method produces a clustering which is unlikely to be altered drastically when further objects are incorporated,i ...2 the method is stable in the sense that small errors in the description of the objects lead to small changes in the clustering;3 the method is independent of the initial ordering of the objects ...These conditions have been adapted from Jardine and Sibson [2]...

In practice,one seeks some sort of optimal trade off between representation and discrimination ...The emphasis on representation leads to what one might call a document orientation:that is,a total preoccupation with modelling what the document is about ...This point of view is also adopted by those concerned with defining a concept of information,they assume that once this notion is properly explicated a document can be represented by the information it contains [37]...The emphasis on discrimination leads to a query orientation ...Automatic keyword classification Many automatic retrieval systems rely on thesauri to modify queries and document representatives to improve the chance of retrieving relevant documents ...

134

which from a computational point of view would simplify things enormously ...An alternative way of using the dependence tree Association Hypothesis Some of the arguments advanced in the previous section can be construed as implying that the only dependence tree we have enough information to construct is the one on the entire document collection ...The basic idea underlying term clustering was explained in Chapter 2 ...If an index term is good at discriminating relevant from non relevantdocuments then any closely associated index term is also likely to begood at this ...