British Machine Vision Association and Society for Pattern Recognition
Vision in the Media
One Day BMVA Technical Meeting to be held on 10th
at British Institute of Radiology, 36 Portland Place, London.
Chairperson: Bernard Buxton (UCL)
ABSTRACT: The aim of this meeting is to bring together a number of talks on the application of novel computer vision, graphics and technological approaches to the visual media. The fusion of computer vision and graphics in creating different kinds of augmented world is becoming within reach of the available imaging and computer technology. The objective of this meeting is to bring together contributions from researchers and application developers to give a broad summary of these recent advances and provide a forum for discussion on the future direction of the field.
10:30 Registration and coffee
10:55 Introduction and welcome, Bernard Buxton (UCL)
11:00 Television in the 21st century (21st Century Box), Richard Storey (BBC Research & Development)
11:40 Motion and Colour-based Segmentation Tools for the Post Production Applications, Graeme Jones (Kingston University)
12:20 View Synthesis and Creating Immersive Video Objects, Stephen Pollard (HP Laboratories)
14:00 Towards Markerless Based Motion Capture, Richard Bowden (Brunel University)
14:40 Immersive Participatory Media, Projecting User Performances into Dynamic Media Events, Sharon Springel (University of Cambridge)
15:40 3D Imaging for Metrology and Display, Neil Davies (DeMontford University)
16:20 Summary and discussion
16:30 Closing remarks and finish
Television in the 21st century (21st Century
Richard Storey (BBC Research & Development)
The television and film production industries enjoy a
monopoly in their market niche - linear programmes crafted lovingly for their
entertainment, information and educational values. Broadcast and film moguls may come and
go but there will always be that demand to sit back and enjoy a good drama, sports event
or documentary, or will there?
There is a new generation growing up whose imaginations have been formed not by television, radio and the written word but by interactive media and immersive 3d video games. This is no longer a sub-culture. The new media's power to captivate is plain to see, even for today's limited range of content. Given that a child's formative experiences become expectations in later life, the newly unfolding media are at best a tremendous opportunity, but at worst, could well sound the dead knell of the content production industry as we know it.
Motion and Colour-based Segmentation Tools for
the Post Production Applications
Graeme Jones (Kingston University)
Many post-production effects still rely on the intensely manual procedure of rotoscoping. In this work, the separation of foreground elements, such as actors, from arbitrary backgrounds rather than from a blue screen is accomplished by accurately estimating the visual motion induced by a moving camera. The optical-flow field of the background is recovered using a parametric motion model (motivated by the three-dimensional pan,tilt-and-zoom motion of a camera) embedded in a spatiotemporal least-squares minimisation framework. A maximum a posteriori probability (MAP) approach is used to assign pixel membership (background, uncovered, covered and foreground) defined relative to the background element. A prior class probabilities for each pixel are recovered by modelling (training and evolving) colour distribution models of the background and foreground elements.
View Synthesis and Creating Immersive Video
Stephen Pollard (HP Laboratories)
A number of researchers have explored ways of constructing
static and temporally varying immersive scenes using real world image data alone. In this
presentation work carried out in the Digital Media Department of Hewlett Packard Research
Labs Bristol will be described and demonstrated.
We have built a number of fully immersive 3D environments based upon extensions to the QTVR panorama browsing concept. The system consists of 2 parts. A mark up process that identifies corresponding polygons between a number of panoramas and makes explicit their occlusion relationships. A browser that allows the full exploration of the region covered by the panoramas in real time. To illustrate to potential exploitation of the approach a netscape plugin version of the browser has also been developed.
Secondly we have developed a computationally simple and fully automatic method for view synthesis (suitable for small baseline applications) based upon edge transfer. The technique uses edge correspondences to transfer an edge sketch to a novel view; this forms that starting point for an efficient morphing algorithm that operates one scan-line at a time to perform the final image synthesis. The result gives an immersive experience and a sense of viewing a real environment.
We have also used these techniques to develop immersive video objects (called 3D video sprites for short) that are an extension of existing digital blue screen techniques to allow greater flexibility in rendering video footage against computer generated three dimensional backdrops. In current systems the location of the live action camera is either fixed or follows a precisely calibrated path during capture and the rendered scene is required to be consistent with it. 3D video sprites, on the other hand, can dynamically combine blue screen footage captured simultaneously from a number of viewpoints to create video sequences from novel viewing directions. This allows the 3D video sprite to be treated more like a true graphic object and allows the specific vantage to be determined after the fact.
Towards Markerless Based Motion Capture
Richard Bowden (Brunel University)
The human vision system is adept at recognising the position and pose of an object, even when presented with a monoscopic view. In situations with low lighting conditions in which only a silhouette is visible, it is still possible for a human to deduce the pose of an object. This is through structural knowledge of the human body and its articulation.
A similar internal model can be constructed mathematically which represents a human body and the possible ways in which it can deform. This information, encapsulated within a Point Distribution Model can be used to locate and track a body. By introducing additional information to the PDM that relates to the anatomical structure of the body, a direct mapping between skeletal structure and projected shape can be achieved.
This work investigates the feasibility of such an approach to the reconstruction of 3D structure from a single view. To further aid the tracking and reconstruction process, additional information about the location of both the head and hands is combined into the model. This helps disambiguate the model and provides useful information for both its initialisation and tracking within the image plane.
Fast non linear approximation techniques are used to further constrain the PDM and show how 3D pose and motion of a human body can be reconstructed from a monoscopic camera sequence.
For more details download the paper
Immersive Participatory Media, Projecting User
Performances into Dynamic Media Events
Sharon Springel (University of Cambridge)
Developments in such areas as telepresence, real-time
computer imaging and advanced network capabilities are all progressing rapidly. What has
traditionally been lacking however, is a clear overall vision of how all of this diverse
technological momentum might be successfully harnessed by the user community itself, in
order to achieve the true paradigm breakthrough that 'Convergence' has been anticipating
for some time now. At Cambridge University's Centre for Communications Systems Research,
(CCSR), we intend
to address this need by exploiting these technological developments in order to devise new systems that will directly empower individuals, allowing them to make use of their own innate creativity by casting them in active roles within unique shared dramatic experiences.
Through such systems, everything from dramatic entertainment through education could potentially be transformed.
For more information on the underlying "Vision" behind this project, see-
For a more detailed "nuts and bolts" breakdown see-
3D Imaging for Metrology and Display
Neil Davies (DeMontford University)
Integral imaging is a method of recording full parallax 3D
images which offers an alternative to stereoscopic techniques. Stereoscopic imaging is the
most widely used method for display and 3D remote vision/inspection. However, stereoscopic
systems using two views or multi view systems with as many as sixteen views, have spatial
acuity limitations when applied to
metrology and human factor drawbacks where prolonged viewing occurs. Integral imaging overcomes these problems by generating a true optical model of the scene being imaged.
The advanced form of integral imaging developed by the 3D and Biomedical Engineering Imaging Group at De Montfort University, enables 1:1 scale images of extremely deep scenes to be generated in real time by a single integral camera.
The image depth limitations of integral imaging first proposed by Lippmann in 1908 have been overcome by the novel use of macro/micro lens combinations which enables high lateral resolution with high spatial acuity over extended scene depths to be achieved.
The nature of the recorded images allows information sampling to be carried out and the image can be replayed on standard resolution electronic displays without loss of 3D information. Stereoscopic systems require a minimum of one pixel per view to replay a scene with 3D integrity, a limitation which does not apply to integral images.
In this talk the evolution of this camera will be explained, image characteristics described and its possible application to 3D TV, remote inspection and metrological systems will be discussed.