Page 71

To unify the discussion of file structures we need some further concepts. Following Hsiao and Harary again, we define a list L of records with respect to a keyword K, or more briefly a K-list as a set of records containing K such that:

(1) the K-pointers are distinct;

(2) each non-null K-pointer in L gives the address of a record within L;

(3) there is a unique record in L not pointed to by any record containing K; it is called the beginning of the list; and

(4) there is a unique record in L containing the null K-pointer; it is the end of the list.

(Hsiao and Harary state condition (2) slightly differently so that no two K-lists have a record in common; this only appears to complicate things.)

From our previous example:

K1-list : R1, R2, R5

K2-list : R2, R4

K4-list : R1, R2, R3

Finally, we need the definition of a directory of a file. Let F be a file whose records contain just m different keywords K1, K2, . . . , Km. Let ni be the number of records containing the keyword Ki, and hi be the number of Ki-lists in F. Furthermore, we denote by aij the beginning address of the jth Ki-list. Then the directory is the set of sequences

(Ki, ni, hi, ai1, ai2, . . . aihi) i = 1, 2, . . . m

We are now in a position to give a unified treatment of sequential files, inverted files, index-sequential files and multi-list files.

Sequential files

A sequential file is the most primitive of all file structures. It has no directory and no linking pointers. The records are generally organised in lexicographic order on the value of some key. In other words, a particular attribute is chosen whose value will determine the order of the records. Sometimes when the attribute value is constant for a large number of records a second key is chosen to give an order when the first key fails to discriminate.

The implementation of this file structure requires the use of asorting routine.