Page 155

155

and links IR measurements to a ready made and well-developed statistical theory, it has not found general acceptance amongst workers in the field.

Before proceeding to an explanation of the Swets model, it is as well to quote in full the conditions that the desired measure of effectiveness is designed to meet. At the beginning of his 1967 report Swets states:

'A desirable measure of retrieval performance would have the following properties: First, it would express solely the ability of a retrieval system to distinguish between wanted and unwanted items - that is, it would be a measure of "effectiveness" only, leaving for separate consideration factors related to cost or "efficiency". Second, the desired measure would not be confounded by the relative willingness of the system to emit items - it would express discrimination power independent of any "acceptance criterion" employed, whether the criterion is characteristic of the system or adjusted by the user. Third, the measure would be a single number - in preference, for example, to a pair of numbers which may co-vary in a loosely specified way, or a curve representing a table of several pairs of numbers - so that it could be transmitted simply and immediately apprehended. Fourth, and finally, the measure would allow complete ordering of different performances, and assess the performance of any one system in absolute terms - that is, the metric would be a scale with a unit, a true zero, and a maximum value. Given a measure with these properties, we could be confident of having a pure and valid index of how well a retrieval system (or method) were performing the function it was primarily designed to accomplish, and we could reasonably ask questions of the form "Shall we pay X dollars for Y units of effectiveness?".'

He then goes on to claim that 'The measure I proposed [in 1963], one drawn from statistical decision theory, has the potential [my italics] to satisfy all four desiderata'. So, what is this measure?

To arrive at the measure, we must first discuss the underlying model. Swets defines the basic variables Precision, Recall, and Fallout in probabilistic terms.

Recall = an estimate of the conditional probability that an item will be

retrieved given that it is relevant [we denote this P(B/A)].

Precision = an estimate of the conditional probability that an item will be

relevant given that it is retrieved [i.e. P(A/B)].

Fallout = an estimate of the conditional probability that an item will be

retrieved given that it is non-relevant[i.e. P(B/`A].

155