The British Machine Vision Association and Society for Pattern Recognition 

BibTeX entry

  AUTHOR={Vasileios Zografos},
  TITLE={Pose-invariant, model-based object recognition, using
    linear combination of viewsand Bayesian statistics},
  SCHOOL={University College London},


This thesis presents an in-depth study on the problem of object recognition, and in particular the detectionof 3-D objects in 2-D intensity images which may be viewed from a variety of angles. A solution to thisproblem remains elusive to this day, since it involves dealing with variations in geometry, photometryand viewing angle, noise, occlusions and incomplete data. This work restricts its scope to a particularkind of extrinsic variation; variation of the image due to changes in the viewpoint from which the objectis seen.A technique is proposed and developed to address this problem, which falls into the category ofview-based approaches, that is, a method in which an object is represented as a collection of a smallnumber of 2-D views, as opposed to a generation of a full 3-D model. This technique is based on thetheoretical observation that the geometry of the set of possible images of an object undergoing 3-D rigidtransformations and scaling may, under most imaging conditions, be represented by a linear combinationof a small number of 2-D views of that object. It is therefore possible to synthesise a novel image of anobject given at least two existing and dissimilar views of the object, and a set of linear coefficients thatdetermine how these views are to be combined in order to synthesise the new image.The method works in conjunction with a powerful optimization algorithm, to search and recover theoptimal linear combination coefficients that will synthesize a novel image, which is as similar as possibleto the target, scene view. If the similarity between the synthesized and the target images is above somethreshold, then an object is determined to be present in the scene and its location and pose are defined,in part, by the coefficients. The key benefits of using this technique is that because it works directlywith pixel values, it avoids the need for problematic, low-level feature extraction and solution of thecorrespondence problem. As a result, a linear combination of views (LCV) model is easy to constructand use, since it only requires a small number of stored, 2-D views of the object in question, and theselection of a few landmark points on the object, the process which is easily carried out during the offline,model building stage. In addition, this method is general enough to be applied across a variety ofrecognition problems and different types of objects.The development and application of this method is initially explored looking at two-dimensionalproblems, and then extending the same principles to 3-D. Additionally, the method is evaluated acrosssynthetic and real-image datasets, containing variations in the objects’ identity and pose. Future work onpossible extensions to incorporate a foreground/background model and lighting variations of the pixelsare examined.