Its main advantages are:
(1) it is easy to implement;
(2) it provides fast access to the next record using lexicographic order.
Its disadvantages:
(1) it is difficult to update - inserting a new record may require moving a large proportion of the file;
(2) random access is extremely slow.
Sometimes a file is considered to be sequentially organised despite the fact that it is not ordered according to any key.
Perhaps the date of acquisition is considered to be the key value, the newest entries are added to the end of the file and therefore pose no difficulty to updating.
Inverted files
The importance of this file structure will become more apparent when Boolean Searches are discussed in the next chapter.
For the moment we limit ourselves to describing its structure.
An inverted file is a file structure in which every list
contains only one record.
Remember that a list is defined with
respect to a keyword K, so every K-list contains only
one record.
This implies that the directory will be such that
ni = hi for all i, that is, the number of records
containing Ki will equal the number of Ki-lists.
So
the directory will have an address for each record containing
Ki .
For document retrieval this means that given a keyword
we can immediately locate the addresses of all the documents
containing that keyword.
For the previous example let us assume that
a non-black entry in the field corresponding to an attribute indicates
the presence of a keyword and a black entry its absence.
Then the
directory will point to the file in the way shown in Figure
4.3.
The definition of an inverted files does not require
that the addresses in the directory are in any order.
However, to
facilitate operations such as conjunction ('and') and disjunction
('or') on any two inverted lists, the addresses are normally kept in
record number order.
This means that 'and' and 'or' operations can
be performed with one pass through both lists.
The penalty we pay is
of course that the inverted file becomes slower to update.
Index-sequential files
An index-sequential file is an inverted file in which for every keyword Ki , we have ni = hi = 1 and a11 <a21 . . . <am1.
This situation can only |