Matching function

Similar concepts

Similarity Concept
Cluster based retrieval
Document clustering
Generality
Term clustering
Term
Document representative
E measure
Relevance
Data retrieval systems

Pages with this concept

Similarity Page Snapshot
51 restricting the number of clusters and by bounding the size of each cluster ...Rather than give a detailed account of all the heuristic algorithms,I shall instead discuss some of the main types and refer the reader to further developments by citing the appropriate authors ...The most important concept is that of cluster representative variously called cluster profile,classification vector,or centroid ...1 the number of clusters desired;2 a minimum and maximum size for each cluster;3 a threshold value on the matching function,below which an object will not be included in a cluster;4 the control of overlap between clusters;5 an arbitrarily chosen objective function which is optimised ...Almost all of the algorithms are iterative,i ...Probably the most important of this kind of algorithm is Rocchio s clustering algorithm [36]which was developed on the SMART project ...Most of these algorithms aim at reducing the number of passes that
52 have to be made of the file of object descriptions ...1 the object descriptions are processed serially;2 the first object becomes the cluster representative of the first cluster;3 each subsequent object is matched against all cluster representatives existing at its processing time;4 a given object is assigned to one cluster or more if overlap is allowed according to some condition on the matching function;5 when an object is assigned to a cluster the representative for that cluster is recomputed;6 if an object fails a certain test it becomes the cluster representative of a new cluster ...Once again the final classification is dependent on input parameters which can only be determined empirically and which are likely to be different for different sets of objects and must be specified in advance ...The simplest version of this kind of algorithm is probably one due to Hill [37]...Related to the single pass approach is the algorithm of MacQueen [41]which starts with an arbitrary initial partition of the objects ...A third type of algorithm is represented by the work of Dattola [42]...
97 then to satisfy the K 1 AND K 2 part we intersect the K 1 and K 2 lists,to satisfy the K 3 AND NOT K 4 part we subtract the K 4 list from the K 3 list ...A slight modification of the full Boolean search is one which only allows AND logic but takes account of the actual number of terms the query has in common with a document ...For the same example as before with the query Q K 1 AND K 2 AND K 3 we obtain the following ranking:Co ordination level 3 D 1,D 2 2 D 3 1 D 4 In fact,simple matching may be viewed as using a primitive matching function ...Matching functions Many of the more sophisticated search strategies are implemented by means of a matching function ...There are many examples of matching functions in the literature ...If M is the matching function,D the set of keywords representing the document,and Q the set representing the query,then:
112 of presenting the basic theory;I have chosen to present it in such a way that connections with other fields such as pattern recognition are easily made ...The fundamental mathematical tool for this chapter is Bayes Theorem:most of the equations derive directly from it ...This was recognised by Maron in his The Logic Behind a Probabilistic Interpretation as early as 1964 [4]...Remember that the basic instrument we have for trying to separate the relevant from the non relevant documents is a matching function,whether it be that we are in a clustered environment or an unstructured one ...It will be assumed in the sequel that the documents are described by binary state attributes,that is,absence or presence of index terms ...Estimation or calculation of relevance When we search a document collection,we attempt to retrieve relevant documents without retrieving non relevant ones ...