Consider an image labelling problem where object Z is to be assigned to one of m possible semantic categories . Let us assume that we have R experts each representing the given object by a distinct measurement vector. Denote the measurement vector used by the i-th expert by . In the measurement space each class is modelled by the probability density function and its a priori probability of occurrence is denoted . We shall consider the models to be mutually exclusive which means that only one model can be associated with each object.
According to the Bayesian theory, given measurements , the object, Z, should be assigned to class , i.e. its label should assume value , provided the aposteriori probability of that interpretation is maximum, i.e.
Let us rewrite the aposteriori probability using the Bayes theorem. We have
where is the class conditional joint probability density and is the unconditional measurement joint probability density. The latter can be expressed in terms of the conditional measurement distributions as and therefore, in the following, we can concentrate only on the numerator terms of (2).
Since the representations used by the experts are distinct, it may be reasonable to assume that measurements will be conditionally statistically independent, i.e
where is the measurement process model of the i-th representation. Substituting from (1) into (2) we find
and using (2) in (2) we obtain the decision rule
or in terms of the aposteriori probabilities yielded by the respective experts
The decision rule (4) quantifies the likelihood of a hypothesis by combining the aposteriori probabilities generated by the individual experts by means of a product rule. It is effectively a severe rule of fusing the expert outputs as it is sufficient for a single recognition engine to inhibit a particular interpretation by outputting a close to zero probability for it. We shall adopt the approach used in  to show that under certain assumptions this severe rule can be developed into a benevolent information fusion rule which has the form of a sum. Let us express the product of the aposteriori probabilities and mixture densities on the right hand side of (4) as
where is a nominal reference value of the mixture density . A suitable choice of is for instance . Substituting (5) for the aposteriori probabilities in (4) we find
If we expand the product and neglect any terms of second and higher order we can approximate the right hand side of (6) as
Substituting (7) and (5) into (4) and eliminating we obtain a sum decision rule
This approximation will be valid provided that
It can be easily established that this condition will be satisfied
will be small in absolute value sense. Note that this condition will hold when the amount of information about class identity of the object gained by observing is small and the observation is representative for the distinction of which means that will be close to the reference value . However, whatever approximation error is introduced when the conditions do not hold, we shall see later that the adoption of the approximation has some other benefits which will justify even the introduction of relatively gross errors at this step.
Before proceeding any further, it may be pertinent to ask, why we did not cancel out the unconditional probability density functions from the decision rule. The main reasons is that this term conveys very useful information about the confidence of the expert in the observation made. It is clear that an object representation for which the value of the probability density is very small for all the classes will be an outlier and should not be classified by the respective expert. By retaining this information, the sum information fusion rule will automatically control the influence of such outliers on the final decision. In other words, the expert fusion rule in (8) is a weighted average rule where the weights reflect the confidence in the soft decision values computed by the individual experts. Thus our decision rule (8) can be expressed as
The main practical difficulty with the weighted average expert opinion combiner as specified in (9) is that not all experts will have the inner capability to output such information. For instance, it would not be provided by a multilayer perceptron and many other classification methods. We shall therefore limit our objectives somewhat and identify the weights which will reflect the relative confidence in the experts in expectation. This can be done easily by selecting weight values by means of minimising the empirical classification error count produced by the decision rule
in which the data dependence of the weights has been suppressed. In other words we find such that where is the k-th training sample and takes values
is minimised. In (11), is the true class label of object and is the class label assigned to it by the decision rule (10). The optimisation can easily be achieved by an exhaustive search through the weight space.
Figure 1: Weighted averaging fusion of multiple expert opinions.
For equal a priori class probabilities the decision rule (10) simplifies to:
The weighted averaging combiner is schematically represented in Figure 1.