behaviour of any one of the components depends in only an aggregate way on the behaviour of the other components.
Now it may be that this is an appropriate analogy for looking at the dynamic behaviour (e.g. updating, change of vocabulary) of document or keyword classifications.
Very little is in fact known about the behaviour of classification structures in dynamic environments.
2.
File structures
On the file structure chosen and the way it is used depends the efficiency of an information retrieval system.
Inverted files have been rather popular in IR systems.
Certainly, in systems based on unweighted keywords especially where queries are formulated in Boolean expressions, an inverted file can give very fast response.
Unfortunately, it is not possible to achieve an efficient adaptation of an inverted file to deal with the matching of more elaborate document and query descriptions such as weighted keywords.
Research into file structures which could efficiently cope with the more complicated document and query descriptions is still needed.
The only way of getting at this may be to start with a document classification and investigate file structures appropriate for it.
Along this line it might well prove fruitful to investigate the relationship between document clustering and relational data bases which organise their data according to n-ary relations.
There are many more problems in this area which are of interest to IR systems.
For example, the physical organisation of large hierarchic structures appropriate to information retrieval is an interesting one.
How is one to optimise allocation of storage to a hierarchy if it is to be stored on devices which have different speeds of access?
3.
Search strategies
So far fairly simple search strategies have been tried.
They have varied between simple serial searches and the cluster-based strategies described in Chapter 5.
Tied up with each cluster-based strategy is its method of cluster representation.
By changing the cluster representative, the decision and stopping rules of search strategies can usually also be changed.
One approach that does not seem to have been tried would involvehaving a number of cluster representatives each perhaps derived fromthe data according to different principles.
|