nice property of being invariant under one-to-one transformations of the co-ordinates.
Other interesting properties of this measure may be found in Osteyee and Good[19].
Rajski[20] shows how I(xixj) may be simply transformed into a distance function on discrete probability distributions.
I(xixj) is often interpreted as a measure of the statistical information contained in xi about xj (or vice versa).
When we apply this function to measure the association between two index terms, say i and j, then xi and xj are binary variables.
Thus P(xi = 1) will be the probability of occurrence of the term i and similarly P(xi = 0) will be the probability of its non-occurrence.
The extent to which two index terms i and j are associated is then measured by I(xixj) which measures the extent to which their distributions deviate from stochastic independence.
A function very similar to the expected mutual information measure was suggested by Jardine and Sibson[2] specifically to measure dissimilarity between two classes of objects.
For example, we may be able to discriminate two classes on the basis of their probability distributions over a simple two-point space {1, 0}.
Thus let P1(1), P1(0) and P2(1), P2(0) be the probability distributions associated with class I and II respectively.
Now on the basis of the difference between them we measure the dissimilarity between I and II by what Jardine and Sibson call the Information Radius, which is

Here u and v are positive weights adding to unit.
This function is readly generalised to multi-state, or indeed continuous distribution.
It is also easy to shown that under some interpretation the expected mutual information measure is a special case of the information radius.
This fact will be of some importance in Chapter 6.
To see it we write P1(.) and P2(.) as two conditional distributions P(./w1) and P(./w2).
If we now interpret u = P(./w1) and v = P(./w2), that is the prior probability of the conditioning variable in P(./wi), then on substituting into the expression for the information radius and using the identities.
P(x) = P(x/w1) P(w1) + P(x/w2) P(w2) x = 0, 1
P(x/wi) = P(x/wi) P(x) i = 1, 2
we recover the expected mutual information measure I(x,wi).
|