Segmentation of Dynamic Scenes with Distributions of Spatiotemporally Oriented Energies
In Proceedings British Machine Vision Conference 2014
AbstractIn video segmentation, disambiguating appearance cues by grouping similar motions or dynamics is potentially powerful, though non-trivial. Changes of appearance can occur from motion and non-motion alike, from simple translating objects to complex dynamic textures or deformations of non-rigid objects. While the former are easily captured by optical flow, phenomena such as a dissipating cloud of smoke, or flickering reflections on water, do not satisfy the assumption of brightness constancy, or cannot be modelled with rigid displacements in the image. We propose a robust representation of image dynamics as histograms of motion energy (HoME) identified from convolutions of the video with spatiotemporal filters. They capture a wide range of dynamics and handle problems previously studied separately (motion and dynamic texture segmentation). They thus offer a potential solution for a new class of problems that contain these effects in the same scene. Our representation of image dynamics is integrated in a graph-based segmentation framework and combined with colour histograms to represent the appearance of regions. In the case of translating and occluding segments, the proposed features additionally serve to characterize the motion of the boundary between pairs of segments, to identify the occluder and inferring a local depth ordering. The resulting segmentation method is completely model-free and unsupervised, and achieves state-of-the-art results on the SynthDB dataset for dynamic texture segmentation, on the MIT dataset for motion segmentation, and competitive performance on the CMU dataset for occlusion boundaries, well above baseline methods.
FilesExtended Abstract (PDF, 1 page, 1.7M)
Paper (PDF, 12 pages, 2.9M)
Supplemental Materials (ZIP, 4.7M)