Page 87

Essentially, it involves storing a number of nodes together in one 'page' of disk storage. During a disk access this page is brought into fast memory, is then searched, and the next page to be accessed is determined.

Scatter storage or hash addressing

One file structure which does not relate very well to the ones mentioned before is known as Scatter Storage. The technique by which the file structure is implemented is often called Hash Addressing. Its underlying principle is appealingly simple. Given that we may access the data through a number of keys Ki, then the address of the data in store is located through a key transformation function f which when applied to Ki evaluates to give the address of the associated data. We are assuming here that with each key is associated only one data item. Also for convenience we will assume that each record (data and key) fits into one location, whose address is in the image space of f. The addresses given by the application of f to the keys Ki are called the hash addresses and f is called a hashing function. Ideally f should be such that it spreads the hash addresses uniformly over the available storage. Of course this would be achieved if the function were one-to-one. Unfortunately this cannot be so because the range of possible key values is usually considerably larger than the range of the available storage addresses. Therefore, given any hashing function we have to contend with the fact that two distinct keys Ki and Kj are likely to map to the same address f(Ki) (=f(Kj)). Before I explain some of the ways of dealing with this I shall give a few examples of hashing functions.

Let us assume that the available storage is of size 2[m] then three simple transformations are as follows:

(1) if Ki is the key, then take the square of its binary representation and select m bits from the middle of the result;

(2) cut the binary representation of Ki into pieces each of m bits and add these together. Now select the m least significant bits of the sum as the hash address;

(3) divide the integer corresponding to Ki by the length of the available store 2[m] and use the remainder as the hash address.

Each of these methods has disadvantages. For example, the last one may given the same address rather frequently if there are patterns in the keys. Before using a particular method, the reader is advised to consult the now extensive literature on the subject, e.g. Morris[29], or Lum et al.[30].