Page 52

have to be made of the file of object descriptions. There are a small number of clustering algorithms which only require one pass of the file of object descriptions. Hence the name 'Single-Pass Algorithm' for some of them. Basically they operate as follows:

(1) the object descriptions are processed serially;

(2) the first object becomes the cluster representative of the first cluster;

(3) each subsequent object is matched against all cluster representatives existing at its processing time;

(4) a given object is assigned to one cluster (or more if overlap is allowed) according to some condition on the matching function;

(5) when an object is assigned to a cluster the representative for that cluster is recomputed;

(6) if an object fails a certain test it becomes the cluster representative of a new cluster.

Once again the final classification is dependent on input parameters which can only be determined empirically (and which are likely to be different for different sets of objects) and must be specified in advance.

The simplest version of this kind of algorithm is probably one due to Hill[37]. Subsequently, many variations have been produced mainly the result of changes in the assignment rules and definition of cluster representatives. (See for example Rieber and Marathe[38], Johnson and Lafuente[39] and Etzweiler and Martin[40].)

Related to the single-pass approach is the algorithm of MacQueen[41] which starts with an arbitrary initial partition of the objects. Cluster representatives are computed for the members (sets) of the partition, and objects are reallocated to the nearest cluster representative.

A third type of algorithm is represented by the work of Dattola[42]. His algorithm is based on an earlier algorithm by Doyle. As in the case of MacQueen, it starts with an initial arbitrary partition and set of cluster representatives. The subsequent processing reallocates the objects, some ending up in a 'rag-bag' cluster (cf. Rocchio). After each reallocation the cluster representative is recomputed, but the new cluster representative will only replace the old one if the new representative turns out to be nearer in some sense to the objects in the new cluster than the old representative. Dattola's algorithm has been used extensively by Murray[43] for generating hierarchic classifications. Related to Dattola's approach is that due to Crouch[29]. Crouch spends more time obtaining the initial partition (he calls them categories) and the corresponding cluster representatives. The initial phase is termed the 'categorisation stage', which is followed by the 'classification stage'. The second stage proceeds to reallocate objects in the normal way. His work is of some interest because of the extensive comparisons hemade