158
Brookes also gives statistical reasons for preferring S2 to S1 which need not concern us here. Geometrically S2 is the perpendicular distance from 0 to OC (see Figure 7.6).

Interestingly enough, Robertson[15] showed that S2 is simply related to the area under the Recall-Fallout curve. In fact, the area is a strictly increasing function of S2. It also has the appealing interpretation that it is equal to the percentage of correct choices a strategy will make when attempting to select from a pair of items, one drawn at random from the non-relevant set and one drawn from the relevant set. It does seem therefore that S2 goes a long way to meeting the requirements laid down by Swets. However, the appropriateness of the model is questionable on a number of grounds. Firstly, the linearity of the OC curve does not necessarily imply that [[lambda]] is normally distributed in both populations, although they will be 'similarly' distributed. Secondly, [[lambda]] is assumed to be continuous which certainly is not the case for the data checked out both by Swets and Brookes, in which the co-ordination level used assumed only integer values. Thirdly, there is no evidence to suggest that in the case of more sophisticated matching functions, as used by the SMART system, that the distributions will be similarly distributed let alone normally. Finally the choice of fallout rather than precision as second variable is hard to justify. The reason is that the proportion of non-relevant retrieved for large systems is going to behave much like the ratio of 'non-relevant' retrieved to 'total documents in system'. For comparative purposes 'total document' may be ignored leaving us with 'non-relevant retrieved' which is complementary to 'relevant retrieved'. But now we may as well use precision instead of fallout.

The Robertson model - the logistic transformation

Robertson in collaboration with Teather has developed a model for estimating the probabilities corresponding to recall and fallout[17]. The estimation procedure is unusual in that in making an estimate of these probabilities for a single query it takes account of two things: one, the amount of data used to arrive at the estimates, and two, the averages of the estimates over all queries. The effect of this is to 'pull' an estimate closer to the overall mean if it seems to be an outlyer whilst at the same time counterbalancing the 'pull' in proportion to the amount of data used to make the estimate in the first place. There is now some evidence to show that this pulling-in-to-the-meanis statistically a reasonable thing to do[18].

158