72

Its main advantages are:

(1) it is easy to implement;

(2) it provides fast access to the next record using lexicographic order.

Its disadvantages:

(1) it is difficult to update - inserting a new record may require moving a large proportion of the file;

(2) random access is extremely slow.

Sometimes a file is considered to be sequentially organised despite the fact that it is not ordered according to any key. Perhaps the date of acquisition is considered to be the key value, the newest entries are added to the end of the file and therefore pose no difficulty to updating.

Inverted files

The importance of this file structure will become more apparent when Boolean Searches are discussed in the next chapter. For the moment we limit ourselves to describing its structure.

An inverted file is a file structure in which every list contains only one record. Remember that a list is defined with respect to a keyword K, so every K-list contains only one record. This implies that the directory will be such that ni = hi for all i, that is, the number of records containing Ki will equal the number of Ki-lists. So the directory will have an address for each record containing Ki . For document retrieval this means that given a keyword we can immediately locate the addresses of all the documents containing that keyword. For the previous example let us assume that a non-black entry in the field corresponding to an attribute indicates the presence of a keyword and a black entry its absence. Then the directory will point to the file in the way shown in Figure 4.3. The definition of an inverted files does not require that the addresses in the directory are in any order. However, to facilitate operations such as conjunction ('and') and disjunction ('or') on any two inverted lists, the addresses are normally kept in record number order. This means that 'and' and 'or' operations can be performed with one pass through both lists. The penalty we pay is of course that the inverted file becomes slower to update.

Index-sequential files

An index-sequential file is an inverted file in which for every keyword Ki , we have ni = hi = 1 and a11 <a21 . . . <am1. This situation can only

72