The British Machine Vision Association and Society for Pattern Recognition 

BibTeX entry

  AUTHOR={Dimitrios Makris},
  TITLE={Learning an Activity-Based Semantic Scene Model},
  SCHOOL={City University, London},


This thesis investigates how scene activity, which is observed by fixed surveillance cameras, can be modelled and learnt. Modelling of activity is performed through a spatio-probabilistic scene model that contains semantics like entry/exit zones, paths, junctions, routes and stop zones. The spatial nature of the model allows physical and semantic representation of the scene features, which can be useful in applications like video annotation and contextual databases. The probabilistic nature of the model encodes the variance and the related uncertainty of the usage of the scene features, which is useful for activity analysis applications, such as motion prediction and atypical motion detection. A variety of models and learning methods are used to represent and automatically derive particular activity-based semantic scene elements. Expectation-Maximisation is used for learning Gaussian Mixture Models and accumulative statistics in image maps are integrated in the methods presented. Also, a novel route model and an appropriate learning algorithm are introduced. Additionally, a Hidden Markov Model superimposed on the scene model is used for enabling activity analysis. The application of the methods is investigated for single cameras and collectively across multiple cameras. Additionally, a novel automatic cross-correlation method is introduced that reveals the topology of a network of activities, as observed by a network of uncalibrated cameras. The method is important not only because it provides an integrated activity model for all the cameras, but also because it provides a mechanism to automatically estimate the topology of the camera network, modelling the activity across the “blind” areas of the surveillance system. All the proposed learning algorithms are unsupervised to allow automatic learning of the scene model. Their input is a set of noisy trajectories derived automatically by motion tracking modules, attached to each of the cameras.