Associating locations from wearable cameras
In Proceedings British Machine Vision Conference 2014
AbstractIn this paper, we address a specific use-case of wearable or handheld camera technology related to indoor navigation. The main question we address is whether it is possible to crowdsource navigational data in the form of video sequences captured from wearable cameras. Without using geometric inference techniques (such as SLAM), we test video data for navigational content, and algorithms for extracting that content. We do not include tracking in this evaluation: our purpose is to explore the hypothesis that visual content, on its own, contains cues that can be mined to infer a person's location. We test this hypothesis through estimating the positional error inferred during one journey with respect to other journeys along the same approximate path. The contribution of this paper is threefold. First, we propose alternative methods for video feature extraction that identifies candidate matches between query sequences against a database of previously acquired sequences. Secondly, we propose an evaluation methodology that estimates the error distributions in position inference with respect to the ground truth. In the evaluation we compare standard approaches in the retrieval context, such as SIFT and HOG3D, to establish positional estimates. The final contribution is a publicly available database comprising over 90,000 frames of video-sequences with positional ground-truth in the form of position along a path. The data was acquired along more than 3 km worth of indoor journeys with a hand-held device (Nexus 4) and a wearable device (Google Glass).
FilesExtended Abstract (PDF, 1 page, 893K)
Paper (PDF, 13 pages, 1.1M)