Demo: Visual Attention for Omnidirectional Images in VR Applications17th May 2017
Understanding visual attention has always been a topic of great interest in different research communities. This is particularly important in omnidirectional images (ODIs) viewed with a head-mounted display (HMD), where only a fraction of the captured scene is displayed at a time, namely viewport.
Here, we share a demo that displays a set of ODIs (provided by the user or using the ones available), while it collects the viewport’s center position at every animation frame for each ODI. The data collected is automatically downloaded at the end of the session.
Smolic, Fang-Yi Chao; Cagri Ozcinar; Lu Zhang; Wassim Hamidouche; Olivier Deforges; Aljosa
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) IEEE, China, 2020.
Omnidirectional videos (ODVs) with spatial audio enable viewers to perceive 360° directions of audio and visual signals during the consumption of ODVs with head-mounted displays (HMDs). By predicting salient audio-visual regions, ODV systems can be optimized to provide an immersive sensation of audio-visual stimuli with high-quality. Despite the intense recent effort for ODV saliency prediction, the current literature still does not consider the impact of auditory information in ODVs. In this work, we propose an audio-visual saliency (AVS360) model that incorporates 360° spatial-temporal visual representation and spatial auditory information in ODVs. The proposed AVS360 model is composed of two 3D residual networks (ResNets) to encode visual and audio cues. The first one is embedded with a spherical representation technique to extract 360° visual features, and the second one extracts the features of audio using the log mel-spectrogram. We emphasize sound source locations by integrating audio energy map (AEM) generated from spatial audio description (i.e., ambisonics) and equator viewing behavior with equator center bias (ECB). The audio and visual features are combined and fused with AEM and ECB via attention mechanism. Our experimental results show that the AVS360 model has significant superiority over five state-of-the-art saliency models. To the best of our knowledge, it is the first work that develops the audio-visual saliency model in ODVs. The code will be publicly available to foster future research on audio-visual saliency in ODVs.
Fearghail, Colm O; Knorr, Sebastian; Smolic, Aljosa
International Conference on 3D Immersion, 2019.
In cinematic virtual reality one of the primary challenges from a storytelling perceptive is that of leading the attention of the viewers to ensure that the narrative is understood as desired. Methods from traditional cinema have been applied to varying levels of success. This paper explores the use of a saliency convolutional neural network model and measures it’s results against the intending viewing area as denoted by the creators and the ground truth as to where the viewers actually looked. This information could then be used to further increase the effectiveness of a director’s ability to focus attention in cinematic VR.
Ozcinar, Cagri; Cabrera, Julian; Smolic, Aljosa
In: IEEE Journal on Emerging and Selected Topics in Circuits and Systems , 2019.
Fearghail, Colm O; Ozcinar, Cagri; Knorr, Sebastian; Smolic, Aljosa
In: International Conference for Interactive Digital Storytelling (ICIDS) 2018 , Dublin, Ireland, 2018, (Received the runner up best full paper award).
To explore methods that are currently used by professional virtual reality (VR) filmlmmakers to tell their stories and guide users, we analyze how end-users view 360° video in the presence of directional cues and evaluate if they are able to follow the actual story of narrative 360° films. In this context, we first collected data from five professional VR filmmakers. The data contains eight 360° videos, the directors cut, which is the intended viewing direction of the director, plot points and directional cues used for user guidance. Then, we performed a subjective experiment with 20 test subjects viewing the videos while their head orientation was recorded. Finally, we present and discuss the experimental results and show, among others, that visual discomfort and disorientation on part of the viewer not only lessen the immersive quality of the films but also cause difficulties in the viewer gaining a full understanding of the narrative that the director wished them to view.
Ozcinar, Cagri; Smolic, Aljosa
In: 10th International Conference on Quality of Multimedia Experience (QoMEX 2018) , 2018.
Monroy, Rafael; Lutz, Sebastian; Chalasani, Tejo; Smolic, Aljosa
SalNet360: Saliency Maps for omni-directional images with CNN Journal Article
In: Signal Processing: Image Communication, 2018, ISSN: 0923-5965.
The prediction of Visual Attention data from any kind of media is of valuable use to content creators and used to efficiently drive encoding algorithms. With the current trend in the Virtual Reality (VR) field, adapting known techniques to this new kind of media is starting to gain momentum. In this paper, we present an architectural extension to any Convolutional Neural Network (CNN) to fine-tune traditional 2D saliency prediction to Omnidirectional Images (ODIs) in an end-to-end manner. We show that each step in the proposed pipeline works towards making the generated saliency map more accurate with respect to ground truth data.
Croci, Simone; Knorr, Sebastian; Smolic, Aljosa
In: 14th European Conference on Visual Media Production, London, UK, 2017.
In this paper, we present a novel sharpness mismatch detection (SMD) approach for stereoscopic omnidirectional images (ODI) for quality control within the post-production work ow, which is the main contribution. In particular, we applied a state of the art SMD approach, which was originally developed for traditional HD images, and extended it to stereoscopic ODIs. A new e cient method for patch extraction from ODIs was developed based on the spherical Voronoi diagram of equidistant points evenly distributed on the sphere. The subdivision of the ODI into patches allows an accurate detection and localization of regions with sharpness mismatch. A second contribution of the paper is the integration of saliency into our SMD approach. In this context, we introduce a novel method for the estimation of saliency maps from viewport data of head-mounted displays (HMD). Finally, we demonstrate the performance of our SMD approach with data collected from a subjective test with 17 participants.
Croci, Simone; Knorr, Sebastian; Goldmann, Lutz; Smolic, Aljosa
In: International Conference on 3D Immersion, Brussels, Belgium, 2017.
In this paper, we present a novel framework for quality control in cinematic VR (360-video) based on Voronoi patches and saliency which can be used in post-production workflows. Our approach first extracts patches in stereoscopic omnidirectional images (ODI) using the spherical Voronoi diagram. The subdivision of the ODI into patches allows an accurate detection and localization of regions with artifacts. Further, we introduce saliency in order to weight detected artifacts according to the visual attention of end-users. Then, we propose different artifact detection and analysis methods for sharpness mismatch detection (SMD), color mismatch detection (CMD) and disparity distribution analysis. In particular, we took two state of the art approaches for SMD and CMD, which were originally developed for conventional planar images, and extended them to stereoscopic ODIs. Finally, we evaluated the performance of our framework with a dataset of 18 ODIs for which saliency maps were obtained from a subjective test with 17 participants.
Abreu, Ana De; Ozcinar, Cagri; Smolic, Aljosa
In: 9th International Conference on Quality of Multimedia Experience (QoMEX), 2017.
Understanding visual attention has always been a topic of great interest in the graphics, image/video processing, robotics and human computer interaction communities. By understanding salient image regions, the compression, transmission and render- ing algorithms can be optimized. This is particularly important in omnidirectional images (ODIs) viewed with a head-mounted display (HMD), where only a fraction of the captured scene is displayed at a time, namely viewport. In order to predict salient image regions, saliency maps are estimated either by using an eye tracker to collect eye fixations during subjective tests or by using computational models of visual attention. However, eye tracking developments for ODIs are still in the early stages and although a large list of saliency models are available, no particular attention has been dedicated to ODIs. Therefore, in this paper, we consider the problem of estimating saliency maps for ODIs viewed with HMDs, when the use of an eye tracker device is not possible. We collected viewport data of 32 participants for 21 ODIs and propose a method to transform the gathered data into saliency maps. The obtained saliency maps are compared in terms of image exposition time used to display each ODI in the subjective tests. Then, motivated by the equator bias tendency in ODIs, we propose a post-processing method, namely FSM, to adapt current saliency models to ODIs requirements. We show that the use of FSM on current models improves their performance by up to 20%. The developed database and testbed are publicly available with this paper.