The British Machine Vision Association and Society for Pattern Recognition 

BibTeX entry

  AUTHOR={Andrew Gilbert},
  TITLE={Scalable and Adaptable Tracking of Humans in Multiple Camera Systems},
  SCHOOL={University of Surrey},


The aim of this thesis is to track objects on a network of cameras both within (intra) and across (inter ) cameras. The algorithms must be adaptable to change and are learnt in a scalable approach. Uncalibrated cameras are used that are spatially separated, and therefore tracking must be able to cope with object occlusions, illuminations changes, and gaps between cameras. The consistency of object descriptors is examined. In construction of robust appearance histogram descriptors, the histogram bin size, colour space and correlation measures are investigated. The consistency of object appearance is used as a measure of success for the possible solutions for tracking objects both intra and inter camera. The choice of descriptor will strongly affect tracking performance, hence these results are important and referred to throughout the thesis. Crowded scenes of people would cause an appearance based individual tracker to fail. Therefore a novel solution to the problem of tracking people within crowded scenes is presented. The aim is to maintain individual object identity through a scene which contains complex interactions and heavy occlusions of people. The strengths of two separate methods are utilised; a global object search seeds positions to a localised frame by frame tracker to form short tracklets. The best path trajectory is found through all the resulting tracklets. The approach relies on a single camera with no ground plane calibration and learns the temporal relationship of objects detections for the scene. The development of a two part method allows robust person tracking through extensive occlusions and crowd interactions. In addition to tracking objects within crowds, this thesis presents a number of contributions to the problem of tracking objects across cameras. A scalable and adaptable approach is used across the spatially separated, uncalibrated cameras with non overlapping fields of view (FOV). The novel approach fuses three cues of appearance, relative size and movement between cameras to learn the camera relationships. These relationships weight the observational likelihood to aid correlation of objects between cameras. Individually each cue has a low performance, but when fused together, a large boost in correlation accuracy is gained. Unlike previous work, a novel incremental learning technique is used, with the three cues learnt in parallel and then fused together to track objects across the spatially separated cameras. Incremental colour calibration is performed between the cameras through transformation matrices. Probabilistic modelling of an object’s bounding box between cameras, introduces a shape cue based on objects relative size, while probabilistic links between learnt entry and exit areas on cameras provides the cue of inter camera movement. The approach requires no colour or environment calibration and does not use batch processing. It learns in an unsupervised manor and increases in accuracy as new evidence is accumulated overtime. Extensive testing is performed with 7 days of video footage using up to eight cameras with an hour of groundtruthed data. The use of these key developments allow for a flexible and adaptable approach to tracking people and objects intra and inter camera.