Page 168

168

Foundation*

Problems of measurement have arisen in physics, psychology, and more recently, the social sciences. Clarification of these problems has been sought with the help of the theory of measurement. I shall attempt to do the same for information retrieval. My purpose is to construct a framework, based on the mathematical theory of measurement within which measures of effectiveness for retrieval systems can be derived. The basic mathematical notions underlying the measurement ideas will be introduced, but for their deeper understanding the reader is referred to the excellent book by Krantz et al.[24]. It would be fair to say that the theory developed there is applied here. Also of interest are the books by Ellis[25] and Lieberman[26].

The problems of measurement in information retrieval differ from those encountered in the physical sciences in one important aspect. In the physical sciences there is usually an empirical ordering of the quantities we wish to measure. For example, we can establish empirically by means of a scale which masses are equal, and which are greater or less than others. Such a situation does not hold in information retrieval. In the case of the measurement of effectiveness by precision and recall, there is no absolute sense in which one can say that one particular pair of precision-recall values is better or worse than some other pair, or, for that matter, that they are comparable at all. However, to leave it at that is to admit defeat. There is

* The next three sections are substantially the same as those appearing in my paper: 'Foundations of evaluation', Journal of Documentation, 30, 365-373 (1974). They have been included with the kind permission of the Managing Editor of Aslib.

no reason why we cannot postulate a particular ordering, or, to put it more mildly, why we can not show that a certain model for the measurement of effectiveness has acceptable properties. The immediate consequence of proceeding in this fashion is that each property ascribed to the model may be challenged. The only defence one has against this is that:

(1) all properties ascribed are consistent;

(2) they bring out into the open all the assumptions made in measuring effectiveness;

(3) each property has an acceptable interpretation;

(4) the model leads to a plausible measure of effectiveness.

It is as well to point out here that it does not lead to a uniquemeasure, but it does show that certain classes of measures can beregarded as being equivalent.

168