Page 83

Trees

Although computer scientists have adopted trees as file structures, their properties were originally investigated by mathematicians. In fact, a substantial part of the Theory of Graphs is devoted to the study of trees. Excellent books on the mathematical aspects of trees (and graphs) have been written by Berge[24], Harary et al.,[25 ]and Ore[26]. Harary's book also contains a useful glossary of concepts in graph theory. In addition Bertziss[3] and Knuth[27] discuss topics in graph theory with applications in information processing.

There are numerous definitions of trees. I have chosen a particularly simple one from Berge. If we think of a graph as a set of nodes (or points or vertices) and a set of lines (or edges) such that each line connects exactly two nodes, then a tree is defined to be a finite connected graph with no cycles, and possessing at least two nodes. To define a cycle we first define a chain. We represent the line uk joining two nodes x and y by uk = [x,y]. A chain is a sequence of lines, in which each line uk has one node in common with the preceding line uk-1, and the other vertex in common with the succeeding line uk+1. An example of a chain is [a,x1], [x1,x2], [x2,x3], [x3,b]. A cycle is a finite chain which begins at a node and terminates at the same node (i.e. in the example a = b).

Berge gives the following theorem showing many equivalent characterisations of trees.

Theorem. Let H be a graph with at least n nodes, where n > 1; any one of the following equivalent properties characterises a tree.

(1) H is connected and does not possess any cycles.

(2) H contains no cycles and has n - 1 lines.

(3) H is connected and has n - l lines.

(4) H is connected but loses this property if any line is deleted.

(5) Every pair of nodes is connected by one and only one chain.

One thing to be noticed in the discussion so far is that no mention has been made of a direction associated with a line. In most applications in computer science (and IR) one node is singled out as special. This node is normally called the root of the tree, and every other node in the tree can only be reached by starting at the root and proceeding along a chain of lines until the node sought is reached. Implicitly therefore, a direction is associated with each line. In fact, when one comes to represent a tree inside a computer by a list structure, often the addresses are stored in a way which allows movement in only one direction. It is convenient to think of a tree as a directed graph with a reserved node as the root of the tree. Of course, if one has a root then each path (directed chain) starting at the root will