Adaptive Structured Pooling for Action Recognition

Svebor Karaman, Lorenzo Seidenari, Shugao Ma, Alberto Del Bimbo and Stan Sclaroff

In Proceedings British Machine Vision Conference 2014


In this paper, we propose an adaptive structured pooling strategy to solve the action recognition problem in videos. Our method aims at individuating several spatio-temporal pooling regions each corresponding to a consistent spatial and temporal subset of the video. Each subset of the video gives a pooling weights map and is represented as a Fisher vector computed from the soft weighted contributions of all dense trajectories evolving in it. We further represent each video through a graph structure, defined over multiple granularities of spatio-temporal subsets. The graph structures extracted from all videos are finally compared with an efficient graph matching kernel. Our approach does not rely on a fixed partitioning of the video. Moreover, the graph structure depicts both spatial and temporal relationships between the spatio-temporal subsets. Experiments on the UCF Sports and the HighFive datasets show performance above the state-of-the-art.


Poster Session


Extended Abstract (PDF, 1 page, 316K)
Paper (PDF, 12 pages, 6.5M)
Supplemental Materials (ZIP, 2.3M)
Bibtex File


Svebor Karaman, Lorenzo Seidenari, Shugao Ma, Alberto Del Bimbo, and Stan Sclaroff. Adaptive Structured Pooling for Action Recognition. Proceedings of the British Machine Vision Conference. BMVA Press, September 2014.


	title = {Adaptive Structured Pooling for Action Recognition},
	author = {Karaman, Svebor and Seidenari, Lorenzo and Ma, Shugao and Del Bimbo, Alberto and Sclaroff, Stan},
	year = {2014},
	booktitle = {Proceedings of the British Machine Vision Conference},
	publisher = {BMVA Press},
	editors = {Valstar, Michel and French, Andrew and Pridmore, Tony}
	doi = { }