Probabilistic search strategies have not been investigated much either*, although such strategies have been tried with some effect in the fields of pattern recognition and automatic medical diagnosis.
Of course, in these fields the object descriptions are more detailed than are the document descriptions in IR, which may mean that for these strategies to work in IR we may require the document descriptions to increase in detail.
In Chapter 5 I mentioned that bottom-up search strategies are apparently more successful than
* The work described in Chapter 6 goes some way to remedying this situation.
the more traditional top-down searches.
This leads me to speculate than it may well be that a
spanning tree on the documents could be an effective structure for guiding a search for relevant documents.
A search strategy based on a spanning tree for the documents may well be able to use the dependence information derived from the spanning tree for the index terms.
An interesting research problem would be to see if by allowing some kind of interaction between the two spanning trees one could improve retrieval effectiveness.
4.
Simulation
The three areas of research discussed so far could fruitfully be explored through a simulation model.
We now have sufficiently details knowledge to enable us to specify a reasonable simulation model of an IR system.
For example, the shape of the distributions of keywords throughout a document collection is known to influence retrieval effectiveness.
By varying these distributions what can one expect to happen to document or keyword classifications? It may be possible to devise more efficient file structures by studying the performance of various file structures while simulating different keyword distributions.
One major open problem is the simulation of relevance.
To my knowledge no one has been able to simulate the characteristics of relevant documents successfully.
Once this problem has been cracked it opens the way to studying such hypotheses as the Cluster and Association hypothesis by simulation.
5.
Evaluation
This has been the most troublesome area in IR.
It is now generally agreed that one should be able to do some sort ofcost-benefit, or efficiency-effectiveness analysis, of a retrievalsystem.
|