is another example of a matching function.
It is of course the same as Dice's coefficient of Chapter 3.
A popular one used by the SMART project, which they call cosine correlation, assumes that the document and query are represented as numerical vectors in t-space, that is Q = (q1, q2, . . , qt) and D = (d1, d2, . . ., dt) where qi and di are numerical weights associated with the keyword i.
The cosine correlation is now simply

or, in the notation for a vector space with a Euclidean norm,

where [[theta]] is the angle between vectors Q and D.
Serial search
Although serial searches are acknowledge to be slow, they are frequently still used as parts of larger systems.
They also provide a convenient demonstration of the use of matching functions.
Suppose there are N documents Di in the system, then the serial search proceeds by calculating N values M(Q, Di) the set of documents to be retrieved is determined.
There are two ways of doing this:
(1) the matching function is given a suitable threshold, retrieving the documents above the threshold and discarding the ones below.
If T is the threshold, then the retrieved set B is the set {Di |M(Q, Di) > T}.
(2) the documents are ranked in increasing order of matching function value.
A rank position R is chosen as cut-off and all documents below the rank are retrieved so that B = {Di |r(i) < R} where r(i) is the rank position assigned to Di.
The hope in each case is that the relevant documents are contained in the retrieved set.
|