Thursday, September 2

8:00 — 9:00


9:00 — 10:00


Prof. Ian Akyildiz, Georgia Tech

10:00 — 10:20


10:20 — 12:20

Oral Session — Architectures and Protocols for Camera Networks

A Resource-Aware Distributed Event Space for Pervasive Smart Camera Networks

Wolfgang Schriebl and Bernhard Rinner Institute of Networked and Embedded Systems, Klagenfurt University

Pervasive smart cameras (PSC) are an emerging technology with the goal of providing user-centric and ubiquitous visual sensor networks. System autonomy and resource-awareness are challenging requirements making local image processing for event-based communication a necessary design constraint. In this paper we present the distributed event space (DES)--a middleware service for the resource-aware management of distributed event data. The DES architecture is based on a decentralized tuple space which describes local events by tuples consisting of position and time of the events as well as a set of features describing the detected objects. The DES supports prompt distribution of detected local events and application-specific functions for sophisticated event filtering. We present a distributed tracking application to demonstrate its applicability.

Occlusion-aware multiple camera reconfiguration

Claudio Piciarelli, Christian Micheloni and Gian Luca Foresti University of Udine

The paper deals with the problem of camera networks reconfiguration. In particular, the case of Pan-Tilt-Zoom (PTZ) cameras is considered, and a method is proposed in order to automatically change the pan, tilt and zoom parameters in order to maximize the coverage of relevant portions of the observed environment. Here, the ``relevant portions'' are defined in terms of activity maps, measuring the passage of moving objects over a map of the monitored scene, however the method can be applied to arbitrary relevance maps. Moreover, occlusions are explicitly handled, so that the map is different for each camera, depending on which portions of the scene are visible from a given point of view. The proposed technique works by approximating the observed zones with ellipses and finds a locally optimal solution by using the Expectation Maximization algorithm. In order to avoid unfeasible solutions (ellipses that cannot be obtained by any PTZ configuration) the computation is performed in a proper space where the geometric constraints due to the camera position become null.

A Fuzzy Model for Coverage Evaluation of Cameras and Multi-Camera Networks

Aaron Mavrinac, Jose Luis Alarcon Herrera and Xiang Chen University of Windsor

A comprehensive and intuitive three-dimensional task-oriented coverage model for cameras and multi-camera networks based on fuzzy sets is presented. The model captures the vagueness inherent in the concept of visual coverage. At present, the model can be used to evaluate, given a scene model and an objective, the coverage performance of a given camera or multi-camera network configuration, as a single numerical metric. Plans to use the model for optimal camera placement and other problems involving coverage are discussed. Examples of qualitative experimental validation of the coverage model are presented.

Matching in Camera Networks using Projective Joint Invariant Signatures

Raman Arora, Charles R. Dyer, Yu Hen Hu and Nigel Boston University of Washington

An efficient method based on projective joint invariant signatures is presented for distributed matching of curves in a camera network. The fundamental projective joint invariants for curves in the real projective space are the volume cross-ratios. A curve in m-dimensional projective space is represented by a signature manifold comprising n-point projective joint invariants, where n is at least m+2. The signature manifold can be used to establish equivalence of two curves in projective space. However, without correspondence between the two curves, matching signature manifolds is a computational challenge. In this paper we overcome this challenge by finding discriminative sections of signature manifolds consistently across varying viewpoints and scoring the similarity between these sections. This motivates a simple yet powerful method for distributed curve matching in a camera network. Experimental results with real data demonstrate the classification performance of the proposed algorithm with respect to the size of the sections of the invariant signature in various noisy conditions.

Fault Detection, Correction, and Tolerance for Collaborative Target Localization in Visual Sensor Networks

Mahmut Karakaya and Hairong Qi University of Tennessee

Collaboration in visual sensor networks (VSNs) is essential not only to compensate for the limitations of each sensor node but also to tolerate inaccurate information generated by faulty sensors in the network. Fault tolerance in VSNs is more challenging than in conventional scalar sensor networks (SSNs) because of the directional sensing nature of cameras and the existence of visual occlusion. This paper focuses on the design of a collaborative target localization algorithm in VSNs that would not only accurately localize targets but also detect the faults in camera orientation, tolerate these errors and further correct them before they cascade. Targets are localized based on distributed camera nodes integrating the so-called certainty map generated at each node, that records the target non-existence information within the camera’s field of view. Based on the locations of detected targets in the final certainty map, we then construct a generative image model in each camera that estimates the camera orientation, detect inaccuracies in camera orientations and correct them. Based on results obtained from both simulation and real experiments, we show that the proposed fault-tolerant method is effective in localization accuracy as well as fault detection and correction performance.

12:20 — 14:20


14:20 — 16:20

Oral Session — Surveillance

A Learning Approach to Interactive Browsing of Surveillance Content

Anders Jonsson, Christophe Parisot and Christophe De Vleeschouwer Universitat Pompeu Fabra

In this paper, we present a novel application for interactive browsing of (recorded) surveillance content. The application is based on user feedback and enables an operator to switch between camera views that are likely to contain the same activity. Our system relies on off-the-shelf background-subtraction activity detection mechanisms. We use two techniques from machine learning to automatically learn the topology of surveillance camera networks. The first technique identifies connections between camera views for which objects are temporarily out of view, while the second technique identifies overlap between views. Testing on an actual surveillance camera network suggests that the approach is both accurate and robust, despite the simplicity of the involved computer vision methods.

Towards Secure and Privacy Sensitive Surveillance

Sven Fleck and Wolfgang Straßer SmartSurv Vision Systems GmbH

This paper analyzes the requirements of an ideal vision system. Two major challenges are identified – security and privacy. Security ensures a reliable and dependable operation where the whole chain is robust against modifications and erasures. This comprises the aspect of authenticity to qualify for legal actions on one hand and to prevent (e.g., man in the middle) attacks from modifying content to burden innocent persons. Privacy is strongly penetrated by today’s surveillance systems. It has to be ensured that both the derived video analysis results and the adequately filtered imaging stream (if still required) are only accessible to adequate user groups to address the privacy dilemma. A vision based sensor should be applicable in the same application fields as other non-camera based application specific sensors, e.g., photoelectric sensors used in male’s restrooms (urinals) – with the same confidence and trust. Related work is surveyed and a first concept is introduced to address these demands. It is based on a smart camera approach and various certification and authentification mechanisms to allow application specific sensors to use the power of the visual modality without the traditional drawbacks. Different use cases are discussed from the point of view of the persons surveilled, persons intending to modify and persons utilizing the results. Finally, the “survmotion” system is presented where some aspects of the concept are implemented to illustrate the potential applicability in various fields.

A Systematic Approach Towards User-Centric Privacy and Security for Smart Camera Networks

Thomas Winkler and Bernhard Rinner Institute of Networked and Embedded Systems, Klagenfurt University

The majority of research in the field of camera networks is targeted at distributed and cooperative processing, advanced computer vision algorithms or the development of embedded, ubiquitous camera systems. Privacy and security are topics that are often overlooked or considered as an afterthought. With the digitalization of visual surveillance, data can easily be stored and accessed. This raises the question how confidential data can be protected, authenticity and integrity can be ensured and access can be restricted. This work discusses security and privacy issues relevant in the context of visual surveillance and camera networks. We try to capture the state of the art on these aspects in the available literature and highlight areas that require special consideration. We present a concept of a privacy-preserving camera system based on Trusted Computing. In our system-level approach, we consider privacy and security as primary goals without limiting the overall usefulness of a camera system.

Multiview Activity Recognition in Smart Homes with Spatio-Temporal Features

Chen Wu, Amir Hossein Khalili and Hamid Aghajan Stanford University

Recognizing activities in a home environment is challenging due to the variety of activities that can be performed at home and the complexity of the environment. Multiple cameras are usually needed to cover the whole observation area. This adds camera fusion as another challenge to activity recognition. We propose a hierarchical approach that recognizes both coarse-level and fine-level activities, in which different image features and learning methods are used for different activities based on their characteristics. The paper focuses on discussing the second-level of activity recognition with spatio-temporal features. Specifically, three fusion approaches for multiview activity recognition with spatio-temporal features are presented, including two decision fusion methods and one feature fusion method. They are comparatively analyzed in terms of their tradeoffs on assumptions on system setup, model transferability and recognition rate. Experiments show that challenging activities with subtle motions such as eating, cutting, scrambling, typing, reading etc. can be recognized with our approaches.

AMiDiViN: Basic Algorithms for Alarm Management in Distributed Vision Networks

Martin Hoffmann, Uwe Jaenen, Ahmed Fares and Joerg Haehner Leibniz Universitaet Hannover

With Distributed Vision Networks, drawbacks of common video surveillance systems can be overcome. Apart from an increase in system scalability and reliability, operators of Distributed Vision Networks benefit from real-time scene analysis. In case serious incidents are detected by cameras, humans can be informed to take appropriate further action. In this paper, we propose the use of mobile terminals that can connect to wireless networked Smart Cameras. Thereby, search requests (containing image features) can be posed by users and notifications in case of alarms (i.e., the detection of previously published image features) can be sent by cameras. The underlying algorithms and their evaluation, both in a simulated and realworld environment, are presented in this paper. The notification of a guard in our testbed of five cameras took below three seconds. An extensive evaluation shows, that distributed alarm management is generally feasible in today’s wireless networks.

On efficient use of multi-view data for activity recognition

Tommi Maatta, Aki Härmä and Hamid Aghajan TU Eindhoven and Philips Research Eindhoven

The focus of the paper is on studying five different methods to combine multi-view data from an uncalibrated smart camera network for human activity recognition. The multi-view classification scenarios studied can be divided to two categories: view selection and view fusion methods. Selection uses a single view to classify, whereas fusion merges multi-view data either on the feature- or label-level. The five methods are compared in the task of classifying human activities in three fully annotated datasets: MAS, VIHASI and HOMELAB, and a combination dataset MAS+VIHASI. Classification is performed based on image features computed from silhouette images with a binary tree structured classifier using 1D CRF for temporal modeling. The results presented in the paper show that fusion methods outperform practical selection methods. Selection methods have their advantages, but they strongly depend on how good of a selection criteria is used, and how well this criteria adapts to different environments. Furthermore, fusion of features outperforms other scenarios within more controlled settings. But the more variability exists in camera placement and characteristics of persons, the more likely improved accuracy in multi-view activity recognition can be achieved by combining candidate labels.

16:20 — 16:40


16:40 — 17:40


18:00 — 23:00