British Machine Vision Association and Society for Pattern Recognition

Probabilistic Models in Vision and Signal Processing

One Day BMVA symposium in association with the RSS Statistical Image Analysis and Processing Study Group at the Royal Statistical Society, 12 Errol Street, London, UK on 9th May 2001.

Download the PDF programme and registration form

Followup Special Issue Journal Call for Participation

Chairpersons: Richard Bowden (Brunel University), Charles C Taylor (University of Leeds)

10:30 Registration and coffee

10:55 Introduction and welcome

11:00 Bayesian Analysis for Data Fusion of Disparate Imaging Systems for Surveillance Purposes, G Jones, R Allsop, J Gilby, N Sumpter (SIRA/UCL/IRISYS)

11:30 Improved Audio Feature Extraction, Manuel Davy, Patrick Wolfe (Cambridge University)

12:00 Target detection and Tracking in Image Sequences Using the Competitive Attentional Tracker, Malcolm Strens, Ian Gregory (CRMV, DERA)

12:30 Generative-Model-Based Tracking of Clusters and Contours, Arthur Pece (University of Copenhagen)

13:00 Lunch

14:00 A Robust Adaptive Visual System for Multi-Target Tracking, Pakorn Kaewtrakulpong, Richard Bowden (Brunel University)

14:30 Growth Models for Shape Data, John T Kent (University of Leeds)

15:00 Practical Issues Regarding the Use of Bayes Theory for Automated Decision Making, Neil Thacker (University of Manchester)

15:30 Tea

15:45 Using Occlusions to Assist in Updating the State of an Articulated Body for Motion Capture, Maurice Ringer, Joan Lasenby (Cambridge University)

16:15 A Statistical Model for Human Motion Synthesis, Luis Molina Tanco, Adrian Hilton (University of Surrey)

16:45 Summary and discussion

16:50 Closing remarks and finish

REGISTRATION FORM: 9 May 2001 Meeting

Please return this form to Leanne Pring, Royston Parkin, 95 Queen Street, Sheffield, S1 1WG, Tel 0114 272 0306, Fax 0114 272 6158 or via email to The meeting is free to members of the BMVA but a charge of 20 is payable by non-members. RSS members should register through the RSS at A sandwich lunch is included for all those who book before May 1st. When registering please enclose a cheque for the appropriate amount made payable to "The British Machine Vision Association".

NAME: ………………………………………………………………………………….
ADDRESS: ………………………………………………………………………………….
TEL: ……………………………………


Bayesian Analysis for Data Fusion of Disparate Imaging Systems for Surveillance Purposes,
G.Jones1, R.Allsop3, J.Gilby2, N.Sumpter4
1) Gwynfor Jones (presenter) is a Sira/UCL PTP Associate – Sira Ltd., South Hill, Chislehurst, Kent BR7 5EH, Tel: 020 8467 2636 x377, Fax: 020 8467 6515, email:
2) Sira Ltd. South Hill, Chislehurst, Kent, BR& 5EH, Tel: 020 8467 2636
3) Centre for Transport Studies, University College London, Gower Street, London, WC1E 6BT,Tel: 020 7679 7009
4) Irisys Ltd., Towcester Mill, Towcester, Northants, NN12 6AD, Tel: 01327 357824

Security surveillance is an ever-expanding industry as more organisations feel the need to protect their interests. Unfortunately, despite the number of research groups working on or with image processing techniques and algorithms, the same problems keep occurring. These are associated with the need for any surveillance system to be tolerant to environmental changes as well as being able to intelligently interpret a scene and distinguish between those activities which are illegal, or potentially so, and those which are not. Further, intelligent surveillance systems all have the same problems of needing to create an alarm at the right point. If the false alarm rate is too high, the system will lose operator confidence. Similarly if the system is not sensitive enough then it runs the risk of missing significant events.

Work has been carried out to fuse the data from a low-cost thermal array with that of a CCTV camera. The array has a low spatial resolution and will only detect objects that are in a state of flux and are hotter than the background. This attribute has the potential for removing most causes of false alarms at little extra financial cost. To this end a calibration algorithm has been developed which maps correspondence between the two cameras. By concentrating the area of search on that indicated by the thermal camera the analysis in both images can then be made more elaborate without undue computational effort.

Experimentation is underway exploring approaches to data fusion within an optimal Bayesian framework. Initial results using static images, involve the processing of areas of interest in the visual image as indicated by the thermal. This is achieved using Markov Random Fields (MRF) to segment the area. The MRF has been extended to incorporate the thermal information with mixed results. Research is extending this work to incorporate a temporal aspect to the fusion process.

Improved Audio Feature Extraction,
Manuel Davy, Patrick Wolfe
Signal Processing Group, Cambridge University Engineering Dept.
University of Cambridge, Trumpington street
Cambridge, CB2 1PZ
Phone: +44 1223 339708

Audio feature extraction lies at the heart of almost all audio signal processing applications, including audio coding, signal enhancement, multimedia indexing and retrieval, and automatic music transcription. Here we describe a statistical approach to audio signal processing, using a time-varying auto-regressive model. By modelling audio signals as the summation of an unknown number of events (`notes'), each comprising an unknown number of harmonics (`partials'), we demonstrate applications to audio feature extraction. The implementation involves a particle filter (recently introduced for TVAR models). Moreover, we show that such a framework benefits from the incorporation of prior knowledge through the use of time-frequency reassignment to determine appropriate prior distributions of the unknown model parameters.

Target detection and Tracking in Image Sequences Using the Competitive Attentional Tracker,
Malcolm Strens, Ian Gregory
Centre for Robotics & Machine Vision
X107 Building
DERA Farnborough,
Hants. GU14 0LX. 01252 395602

We apply an approximate Bayesian method for multi-target tracking in infrared and visual band images, in the presence of noise and clutter. Track-before-detect and multi-target tracking prove intractable for a direct Bayesian approach (e.g. particle filters) because they are intrinsically very high-dimensional problems; one or more parameters must be estimated for every pixel in the image.

We will describe a representation for the probability density of a target's motion which is factored into separate position, velocity and acceleration distributions. Each of these individual distributions is represented as an image, sampled at the resolution of the input image (or a multiple thereof). An important benefit of this approach is the state dynamics can be implemented by image convolution, and combination of evidence is achieved by image multiplication. Furthermore, the instantaneous response image from a target detection filter can be regarded as a probability density and used as the input to the tracker. This "spatial PDF tracker" has been proven to track individual targets
at very low signal-to-noise ratios, in infrared and visible-band imagery.

The Competitive Attentional Tracker (CAT) runs many PDF trackers in parallel so that they each track a different region or different motion within the scene. Each pixel in the image has an association probability to each PDF tracker; an expectation-maximisation like process is used to determine this assignment. Essentially, the trackers compete to explain the pixels. Each PDF tracker receives bottom-up input only from pixels with non-zero association probabilities, and make predictions over the same region. Thus each PDF tracker has a "focus of attention" in the joint position-velocity space. The CAT has been successfully applied to multi-target detection and tracking in infrared and visible-band imagery. The CAT has a clear interpretation as approximating Bayesian inference over the joint parameter space of all the individual PDF trackers.

We will demonstrate (i) the PDF tracker for infrared target tracking in high thermal noise; (ii) the CAT applied to tracking air targets through heavy cloud clutter; (iii) the CAT operating in track-before-detect mode.

Generative-Model-Based Tracking of Clusters and Contours,
Arthur Pece
Dept. of Computer Science
University of Copenhagen
Universitetsparken 1
DK-2100 Copenhagen

For the purposes of this talk, generative-model-based vision is a methodology which prescribes (1) the formulation of a parameterized probabilistic model of image generation, (2) maximization of the posterior probability (given an image or image sequence) of model parameters (state variables). The state variables are whatever people want to know, e.g. the position, size, shape, etc. of objects of interest.

Two algorithms that fit into this framework will be presented:

Cluster tracker: moving objects generate clusters of pixel values significantly different from the background. The cluster tracker uses the EM algorithm to determine the centroids and covariances of these clusters, i.e. the image position and image size of moving objects.  Methods to decide how many distinct objects are moving  will also be presented.

Feature-free contours: many active-contour methods assume that object boundaries generate "features" in the image, e.g. image edges. A more realistic assumption is that object boundaries generate discontinuities in the correlations between neighboring grey levels. This assumption leads to efficient collection of relevant image statistics, without human intervention. In addition, the geometry of objects is not known exactly, suggesting that image evidence should be marginalized over small deformations of model shapes. The generative model for the feature-free-contour tracker  is based on the above two principles. A Newton-like method allows fast optimization of state variables. The model is complemented by a dynamical component which allows the application of an extended Kalman filter.

The two methods will be demonstrated on the video sequences from PETS 2000 (IEEE workshop on Performance Evaluation in Tracking and Surveillance). The contour-based method will also be demonstrated on a camera pose-refinement task.

A Robust Adaptive Visual System for Multi-Target Tracking,
Pakorn Kaewtrakulpong, Richard Bowden
Vision and Virtual Reality
Department of Systems Engineering
Brunel University, Middlesex, UB8 3PH.

This paper presents an adaptive multiple-target tracker in a surveillance site. The tracker uses a static camera to detect and track moving objects in real-time. It consists of two parts, background subtraction and motion tracking. An adaptive multicolour background model is employed for each pixel in the camera scene. The background subtraction module uses this model to extract hypothesised moving regions from the scene. The regions are then passed through connected component analysis and shadow elimination. Appearance and motion models are then constructed for each region resulted from the previous stage. The appearance model is based on colour clue of each detected object. By applying Consensus colours on Munsell colour map, a colour distribution of each object can be represented by a normalised histogram. The motion tracking module uses Kalman filters to estimate the current states of the moving objects. A data association algorithm is introduced in this stage to label the measurements according to their objects. This algorithm utilises appearance models to search for the measurement corresponding to objects previously detected. The search is constrained within the predicted area obtained from the Kalman filter. This information is then fed back to the corresponding Kalman filter to update the state of the motion model. This produces a tracking system with fast, efficient and unsupervised learning ability that can be applied to any real-time multiple-target tracking applications.

Growth Models for Shape Data,
John T Kent
University of Leeds

We consider strategies to model the evolution of landmark-based shapes though time. In some ways this problem falls within the framework of longitudinal analysis for multivariate data. However, there are two extra features. Firstly, the shape of an object is invariant under changes in location, scale and rotation, and this property must be incorporated into any models. Secondly, the landmarks lie in a Euclidean space (usually of dimension 2 or 3), and any model of landmark evolution can be interpolated to yield a dynamic deformation of space.

Practical Issues Regarding the Use of Bayes Theory for Automated Decision Making,
Neil Thacker
Division of Imaging Science & Biomedical Engineering
University of Manchester
Stopford Building
Oxford Rd
M13 9PT

Bayes theory is used as a way of constructing probabilistic decision systems so that prior knowledge can be used in data analysis. Bayes theory incorporates prior information in the form of probabilities into data analysis in order to ``bias'' the interpretation of the data in the direction of expectation. It therefore has greatest influence when the data under analysis is very weak. Under these circumstances a purely data driven solution may not be directly possible. The problems with the use of Bayes theory are actually quite well known, though the people who espouse this aproach to data analysis are frequently fiercely protective and do not like people even suggesting at weaknesses in this area. This talk has been written with the view to discussing a range of these limitations and motivating a methodology for prediction of practical achievability when using such systems. The talk will be illustrated with several systems which have been suggested in the literature.

The slides from this presentation are available here.

Using Occlusions to Assist in Updating the State of an Articulated Body for Motion Capture,
Maurice Ringer, Joan Lasenby
Cambridge University,
Dept of Engineering
Cambridge CB2 1PZ

PDF Extended Abstract

A Statistical Model for Human Motion Synthesis,
Luis Molina Tanco, Adrian Hilton
School of EC&M 
Univ.of Surrey
Surrey GU2 5XH +44 (0)1483 879838

In this talk we present a system that can synthesise novel motion sequences from a database of motion capture examples. This is achieved through learning a statistical model from the captured data which enables realistic synthesis of new movements by sampling the original captured sequences. New movements are synthesised by specifying the start and end keyframes. The statistical model identifies segments of the original motion capture data to generate novel motion sequences between the keyframes. The advantage of this approach is that it combines the flexibility of keyframe animation with the realism of motion capture data.

PDF Extended Abstract