Page 121

121

In general the dependence can be arbitrarily complex as the following identity illustrates,

P(x) = P(x1)P(x2/x1)P(x3/x1,x2) ... P(xn/x1,x2, ... , xn - 1)

Therefore, to capture all dependence data we would need to condition each variable in turn on a steadily increasing set of other variables. although in principle this may be possible, it is likely to be computationally inefficient, and impossible in some instances where there is insufficient data to calculate the high order dependencies. Instead we adopt a method of approximation to estimate P(x) which captures the significant dependence information. Intuitively this may be described as one which looks at each factor in the above expansion and selects from the conditioning variables one particular variable which accounts for most of the dependence relation. In other words we seek a product approximation of the form

where (m1, m2, ..., mn) is a permutation of the integers 1,2, ..., n and j(.) is a function mapping i into integers less than i, and P(xi/xm0) is P(xi). An example for a six component vector x = (x1, ..., x6) might be

Pt(x) = P(x1)P(x2/x1)P(x3/x2)P(x4/x2)P(x5/x2)P (x6/x5)

Notice how similar the A2 assumption is to the independence assumption A1, the only difference being that in A2 each factor has a conditioning variable associated with it. In the example the permutation (m1, m2, ..., m6) is (1,2, ..., 6) which is just the natural order, of course the reason for writing the expansion for Pt(x) the way I did in A2 is to show that a permutation of (1,2, ..., 6) must be sought that gives a good approximation. Once this permutation has been found the variables could be relabelled so as to have the natural order again.

The permutation and the function j(.) together define a dependence tree and the corresponding Pt(x) is called a probability distribution of (first-order) tree dependence. The tree corresponding to our six variable example is shown in Figure 6.1. The tree shows which variable appears either side of the conditioning stroke in P(./.). although I have chosen to

write the function Pt(x) the way I did with xi as the unconditioned variable, and hence the root of the tree, and all others consistently conditioned each on its parent node, in fact any one of the nodes of the tree could be singled out as the root as long as the conditioning is done consistently with respect to the new root node. (In Figure 6.1 the 'direction' of conditioning is marked bythe direction associated with

121