Classification methods
Let me start with a description of the kind of data for which classification methods are appropriate.
The data consists of objects and their corresponding descriptions.
The objects may be documents, keywords, hand written characters, or species (in the last case the objects themselves are classes as opposed to individuals).
The descriptors come under various names depending on their structure:
(1) multi-state attributes (e.g. colour)
(2) binary-state (e.g. keywords)
(3) numerical (e.g. hardness scale, or weighted keywords)
(4) probability distributions.
The fourth category of descriptors is applicable when the objects are classes.
For example, the leaf width of a species of plants may be described by a normal distribution of a certain mean and variance.
It is in an attempt to summarise and simplify this kind of data that classification methods are used.
Some excellent surveys of classification methods now exist, to name but a few, Ball[21], Cormack[14] and Dorofeyuk[22].
In fact, methods of classification are now so numerous, that Good[23] has found it necessary to give a classification of classification.
Sparck Jones[24]has provided a very clear intuitive break down of classification methods in terms of some general characteristics of the resulting classificatory system.
In what follows the primitive notion of 'property' will mean feature of an object.
I quote:
(1) Relation between properties and classes
(a) monothetic
(b) polythetic
(2) Relation between objects and classes
(a) exclusive
(b) overlapping
(3) Relation between classes and classes
(a) ordered
(b) unordered
The first category has been explored thoroughly by numerical taxonomists.
An early statement of the distinction between monothetic and polythetic is given by Beckner[25]: 'A class is ordinarily defined by reference to a set of properties which are both necessary and sufficient (by stipulation) for membership in the class.
It is possible, however, to define a group K in terms of a set G of properties f1, f2, . . . , fn in a different manner.
Suppose we have an aggregate of individuals (we shall not yet call them a class) such that
|