Return of the Devil in the Details: Delving Deep into Convolutional Nets

Ken Chatfield, Karen Simonyan, Andrea Vedaldi and Andrew Zisserman

In Proceedings British Machine Vision Conference 2014


The latest generation of Convolutional Neural Networks (CNN) have achieved impressive results in challenging benchmarks on image recognition and object detection, significantly raising the interest of the community in these methods. Nevertheless, it is still unclear how different CNN methods compare with each other and with previous state-of-the-art shallow representations such as the Bag-of-Visual-Words and the Improved Fisher Vector. This paper conducts a rigorous evaluation of these new techniques, exploring different deep architectures and comparing them on a common ground, identifying and disclosing important implementation details. We identify several useful properties of CNN-based representations, including the fact that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance. We also identify aspects of deep and shallow methods that can be successfully shared. A particularly significant one is data augmentation, which achieves a boost in performance in shallow methods analogous to that observed with CNN-based methods. Finally, we are planning to provide the configurations and code that achieve the state-of-the-art performance on the PASCAL VOC Classification challenge, along with alternative configurations trading-off performance, computation speed and compactness.


Machine Learning


Extended Abstract (PDF, 1 page, 119K)
Paper (PDF, 12 pages, 327K)
Bibtex File



Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets. Proceedings of the British Machine Vision Conference. BMVA Press, September 2014.


	title = {Return of the Devil in the Details: Delving Deep into Convolutional Nets},
	author = {Chatfield, Ken and Simonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew},
	year = {2014},
	booktitle = {Proceedings of the British Machine Vision Conference},
	publisher = {BMVA Press},
	editors = {Valstar, Michel and French, Andrew and Pridmore, Tony}
	doi = { }