Page 160

160

system so that if we were to adopt [[Delta]] as a measure of effectiveness we could be throwing away vital information needed to make an extrapolation to the performance of other systems.

The Cooper model - expected search length

In 1968, Cooper[20] stated: 'The primary function of a retrieval system is conceived to be that of saving its users to as great an extent as is possible, the labour of perusing and discarding irrelevant documents, in their search for relevant ones'. It is this 'saving' which is measured and is claimed to be the single index of merit for retrieval systems. In general the index is applicable to retrieval systems with ordered (or ranked) output. It roughly measures the search effort which one would expect to save by using the retrieval system as opposed to searching the collection at random. An attempt is made to take into account the varying difficulty of finding relevant documents for different queries. The index is calculated for a query of a precisely specified type. It is assumed that users are able to quantify their information need according to one of the following types:

(a) only one relevant document is wanted;

(b) some arbitrary number n is wanted;

(4) a given proportion of the relevant documents is wanted, etc.

Thus, the index is a measure of performance for a query of given type. Here we shall restrict ourselves to Type 2 queries. For further details the reader is referred to Cooper[20].

The output of a search strategy is assumed to be a weak ordering of documents. I have defined this concept on page 118 in a different context. We start by first considering a special case, namely a simple ordering, which is a weak ordering such that for any two distinct elements e1 and e2 it is never the case that e1 R e2 and e2 R e1 (where R is the order relation). This simply means that all the documents in the output are ordered linearly with no two or more documents at the same level of the ordering. The search length is now defined as the number of non-relevant documents a user must scan before his information need (in terms of the type quantification above) is satisfied. For example, consider a ranking of 20 documents in which the relevant ones are distributed as in Figure 7.7. A Type 2 query with n = 2 would have search length 2, withn = 6 it would have search length 3.

160