2020
|
Smolic, Fang-Yi Chao; Cagri Ozcinar; Lu Zhang; Wassim Hamidouche; Olivier Deforges; Aljosa Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio Conference 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) IEEE, China, 2020. @conference{Smolic2020,
title = {Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio},
author = {Fang-Yi Chao; Cagri Ozcinar; Lu Zhang; Wassim Hamidouche; Olivier Deforges; Aljosa Smolic},
url = {https://ieeexplore.ieee.org/abstract/document/9301766},
doi = {10.1109/VCIP49819.2020.9301766},
year = {2020},
date = {2020-12-01},
publisher = {IEEE},
address = {China},
organization = {2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)},
abstract = {Omnidirectional videos (ODVs) with spatial audio enable viewers to perceive 360° directions of audio and visual signals during the consumption of ODVs with head-mounted displays (HMDs). By predicting salient audio-visual regions, ODV systems can be optimized to provide an immersive sensation of audio-visual stimuli with high-quality. Despite the intense recent effort for ODV saliency prediction, the current literature still does not consider the impact of auditory information in ODVs. In this work, we propose an audio-visual saliency (AVS360) model that incorporates 360° spatial-temporal visual representation and spatial auditory information in ODVs. The proposed AVS360 model is composed of two 3D residual networks (ResNets) to encode visual and audio cues. The first one is embedded with a spherical representation technique to extract 360° visual features, and the second one extracts the features of audio using the log mel-spectrogram. We emphasize sound source locations by integrating audio energy map (AEM) generated from spatial audio description (i.e., ambisonics) and equator viewing behavior with equator center bias (ECB). The audio and visual features are combined and fused with AEM and ECB via attention mechanism. Our experimental results show that the AVS360 model has significant superiority over five state-of-the-art saliency models. To the best of our knowledge, it is the first work that develops the audio-visual saliency model in ODVs. The code will be publicly available to foster future research on audio-visual saliency in ODVs.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Omnidirectional videos (ODVs) with spatial audio enable viewers to perceive 360° directions of audio and visual signals during the consumption of ODVs with head-mounted displays (HMDs). By predicting salient audio-visual regions, ODV systems can be optimized to provide an immersive sensation of audio-visual stimuli with high-quality. Despite the intense recent effort for ODV saliency prediction, the current literature still does not consider the impact of auditory information in ODVs. In this work, we propose an audio-visual saliency (AVS360) model that incorporates 360° spatial-temporal visual representation and spatial auditory information in ODVs. The proposed AVS360 model is composed of two 3D residual networks (ResNets) to encode visual and audio cues. The first one is embedded with a spherical representation technique to extract 360° visual features, and the second one extracts the features of audio using the log mel-spectrogram. We emphasize sound source locations by integrating audio energy map (AEM) generated from spatial audio description (i.e., ambisonics) and equator viewing behavior with equator center bias (ECB). The audio and visual features are combined and fused with AEM and ECB via attention mechanism. Our experimental results show that the AVS360 model has significant superiority over five state-of-the-art saliency models. To the best of our knowledge, it is the first work that develops the audio-visual saliency model in ODVs. The code will be publicly available to foster future research on audio-visual saliency in ODVs. |
2019
|
Fearghail, Colm O; Knorr, Sebastian; Smolic, Aljosa Analysis of Intended Viewing Area vs Estimated Saliency on Narrative Plot Structures in VR Film Conference International Conference on 3D Immersion, 2019. @conference{Fearghail2019,
title = {Analysis of Intended Viewing Area vs Estimated Saliency on Narrative Plot Structures in VR Film},
author = {Colm O Fearghail and Sebastian Knorr and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/?attachment_id=4339},
year = {2019},
date = {2019-12-11},
booktitle = {International Conference on 3D Immersion},
abstract = {In cinematic virtual reality one of the primary challenges from a storytelling perceptive is that of leading the attention of the viewers to ensure that the narrative is understood as desired. Methods from traditional cinema have been applied to varying levels of success. This paper explores the use of a saliency convolutional neural network model and measures it’s results against the intending viewing area as denoted by the creators and the ground truth as to where the viewers actually looked. This information could then be used to further increase the effectiveness of a director’s ability to focus attention in cinematic VR.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In cinematic virtual reality one of the primary challenges from a storytelling perceptive is that of leading the attention of the viewers to ensure that the narrative is understood as desired. Methods from traditional cinema have been applied to varying levels of success. This paper explores the use of a saliency convolutional neural network model and measures it’s results against the intending viewing area as denoted by the creators and the ground truth as to where the viewers actually looked. This information could then be used to further increase the effectiveness of a director’s ability to focus attention in cinematic VR. |
Ozcinar, Cagri; Cabrera, Julian; Smolic, Aljosa Visual Attention-Aware Omnidirectional Video Streaming Using Optimal Tiles for Virtual Reality Journal Article In: IEEE Journal on Emerging and Selected Topics in Circuits and Systems , 2019. @article{Ozcinar2019,
title = {Visual Attention-Aware Omnidirectional Video Streaming Using Optimal Tiles for Virtual Reality},
author = {Cagri Ozcinar and Julian Cabrera and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/va-aware-odv-streaming/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/03/JETCAS_SI_immersive_2018_pc.pdf},
year = {2019},
date = {2019-05-15},
journal = {IEEE Journal on Emerging and Selected Topics in Circuits and Systems },
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
2018
|
Fearghail, Colm O; Ozcinar, Cagri; Knorr, Sebastian; Smolic, Aljosa Director's Cut - Analysis of Aspects of Interactive Storytelling for VR Films Inproceedings In: International Conference for Interactive Digital Storytelling (ICIDS) 2018
, Dublin, Ireland, 2018, (Received the runner up best full paper award). @inproceedings{Fearghail2018,
title = {Director's Cut - Analysis of Aspects of Interactive Storytelling for VR Films},
author = {Colm O Fearghail and Cagri Ozcinar and Sebastian Knorr and Aljosa Smolic },
url = {https://v-sense.scss.tcd.ie:443/research/3dof/directors-cut-analysis-of-aspects-of-interactive-storytelling-for-vr-films/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/12/storyTelling.pdf},
year = {2018},
date = {2018-12-05},
booktitle = {International Conference for Interactive Digital Storytelling (ICIDS) 2018
},
address = {Dublin, Ireland},
abstract = {To explore methods that are currently used by professional virtual reality (VR) filmlmmakers to tell their stories and guide users, we analyze how end-users view 360° video in the presence of directional cues and evaluate if they are able to follow the actual story of narrative 360° films. In this context, we first collected data from five professional VR filmmakers. The data contains eight 360° videos, the directors cut, which is the intended viewing direction of the director, plot points and directional cues used for user guidance. Then, we performed a subjective experiment with 20 test subjects viewing the videos while their head orientation was recorded. Finally, we present and discuss the experimental results and show, among others, that visual discomfort and disorientation on part of the viewer not only lessen the immersive quality of the films but also cause difficulties in the viewer gaining a full understanding of the narrative that the director wished them to view.},
note = {Received the runner up best full paper award},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
To explore methods that are currently used by professional virtual reality (VR) filmlmmakers to tell their stories and guide users, we analyze how end-users view 360° video in the presence of directional cues and evaluate if they are able to follow the actual story of narrative 360° films. In this context, we first collected data from five professional VR filmmakers. The data contains eight 360° videos, the directors cut, which is the intended viewing direction of the director, plot points and directional cues used for user guidance. Then, we performed a subjective experiment with 20 test subjects viewing the videos while their head orientation was recorded. Finally, we present and discuss the experimental results and show, among others, that visual discomfort and disorientation on part of the viewer not only lessen the immersive quality of the films but also cause difficulties in the viewer gaining a full understanding of the narrative that the director wished them to view. |
Ozcinar, Cagri; Smolic, Aljosa Visual Attention in Omnidirectional Video for Virtual Reality Applications Inproceedings In: 10th International Conference on Quality of Multimedia Experience (QoMEX 2018) , 2018. @inproceedings{Ozcinar2018,
title = {Visual Attention in Omnidirectional Video for Virtual Reality Applications},
author = {Cagri Ozcinar and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/3dof/visual-attention-in-omnidirectional-video-for-virtual-reality-applications/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/05/OmniAttention2018.pdf},
year = {2018},
date = {2018-05-29},
booktitle = {10th International Conference on Quality of Multimedia Experience (QoMEX 2018) },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Monroy, Rafael; Lutz, Sebastian; Chalasani, Tejo; Smolic, Aljosa SalNet360: Saliency Maps for omni-directional images with CNN Journal Article In: Signal Processing: Image Communication, 2018, ISSN: 0923-5965. @article{Monroy_SalNet2018,
title = {SalNet360: Saliency Maps for omni-directional images with CNN},
author = {Rafael Monroy and Sebastian Lutz and Tejo Chalasani and Aljosa Smolic},
url = {https://arxiv.org/abs/1709.06505
https://github.com/V-Sense/salnet360
https://v-sense.scss.tcd.ie:443/research/3dof/salnet360-saliency-maps-for-omni-directional-images-with-cnn/},
doi = {10.1016/j.image.2018.05.005},
issn = {0923-5965},
year = {2018},
date = {2018-05-12},
journal = {Signal Processing: Image Communication},
abstract = {The prediction of Visual Attention data from any kind of media is of valuable use to content creators and used to efficiently drive encoding algorithms. With the current trend in the Virtual Reality (VR) field, adapting known techniques to this new kind of media is starting to gain momentum. In this paper, we present an architectural extension to any Convolutional Neural Network (CNN) to fine-tune traditional 2D saliency prediction to Omnidirectional Images (ODIs) in an end-to-end manner. We show that each step in the proposed pipeline works towards making the generated saliency map more accurate with respect to ground truth data.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
The prediction of Visual Attention data from any kind of media is of valuable use to content creators and used to efficiently drive encoding algorithms. With the current trend in the Virtual Reality (VR) field, adapting known techniques to this new kind of media is starting to gain momentum. In this paper, we present an architectural extension to any Convolutional Neural Network (CNN) to fine-tune traditional 2D saliency prediction to Omnidirectional Images (ODIs) in an end-to-end manner. We show that each step in the proposed pipeline works towards making the generated saliency map more accurate with respect to ground truth data. |
2017
|
Croci, Simone; Knorr, Sebastian; Smolic, Aljosa Saliency-Based Sharpness Mismatch Detection For Stereoscopic Omnidirectional Images Inproceedings In: 14th European Conference on Visual Media Production, London, UK, 2017. @inproceedings{Croci2017a,
title = {Saliency-Based Sharpness Mismatch Detection For Stereoscopic Omnidirectional Images},
author = {Simone Croci and Sebastian Knorr and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2017/10/2017_CVMP_Saliency-Based-Sharpness-Mismatch-Detection-For-Stereoscopic-Omnidirectional-Images.pdf},
doi = {https://doi.org/10.1145/3150165.3150168},
year = {2017},
date = {2017-12-11},
booktitle = {14th European Conference on Visual Media Production},
address = {London, UK},
abstract = {In this paper, we present a novel sharpness mismatch detection (SMD) approach for stereoscopic omnidirectional images (ODI) for quality control within the post-production work ow, which is the main contribution. In particular, we applied a state of the art SMD approach, which was originally developed for traditional HD images, and extended it to stereoscopic ODIs. A new e cient method for patch extraction from ODIs was developed based on the spherical Voronoi diagram of equidistant points evenly distributed on the sphere. The subdivision of the ODI into patches allows an accurate detection and localization of regions with sharpness mismatch. A second contribution of the paper is the integration of saliency into our SMD approach. In this context, we introduce a novel method for the estimation of saliency maps from viewport data of head-mounted displays (HMD). Finally, we demonstrate the performance of our SMD approach with data collected from a subjective test with 17 participants.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper, we present a novel sharpness mismatch detection (SMD) approach for stereoscopic omnidirectional images (ODI) for quality control within the post-production work ow, which is the main contribution. In particular, we applied a state of the art SMD approach, which was originally developed for traditional HD images, and extended it to stereoscopic ODIs. A new e cient method for patch extraction from ODIs was developed based on the spherical Voronoi diagram of equidistant points evenly distributed on the sphere. The subdivision of the ODI into patches allows an accurate detection and localization of regions with sharpness mismatch. A second contribution of the paper is the integration of saliency into our SMD approach. In this context, we introduce a novel method for the estimation of saliency maps from viewport data of head-mounted displays (HMD). Finally, we demonstrate the performance of our SMD approach with data collected from a subjective test with 17 participants. |
Croci, Simone; Knorr, Sebastian; Goldmann, Lutz; Smolic, Aljosa A Framework for Quality Control in Cinematic VR Based on Voronoi Patches and Saliency Inproceedings In: International Conference on 3D Immersion, Brussels, Belgium, 2017. @inproceedings{Croci2017b,
title = {A Framework for Quality Control in Cinematic VR Based on Voronoi Patches and Saliency},
author = {Simone Croci and Sebastian Knorr and Lutz Goldmann and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2017/10/2017_IC3D_A-FRAMEWORK-FOR-QUALITY-CONTROL-IN-CINEMATIC-VR-BASED-ON-VORONOI-PATCHES-AND-SALIENCY.pdf},
year = {2017},
date = {2017-12-11},
booktitle = {International Conference on 3D Immersion},
address = {Brussels, Belgium},
abstract = {In this paper, we present a novel framework for quality control in cinematic VR (360-video) based on Voronoi patches and saliency which can be used in post-production workflows. Our approach first extracts patches in stereoscopic omnidirectional images (ODI) using the spherical Voronoi diagram. The subdivision of the ODI into patches allows an accurate detection and localization of regions with artifacts. Further, we introduce saliency in order to weight detected artifacts according to the visual attention of end-users. Then, we propose different artifact detection and analysis methods for sharpness mismatch detection (SMD), color mismatch detection (CMD) and disparity distribution analysis. In particular, we took two state of the art approaches for SMD and CMD, which were originally developed for conventional planar images, and extended them to stereoscopic ODIs. Finally, we evaluated the performance of our framework with a dataset of 18 ODIs for which saliency maps were obtained from a subjective test with 17 participants.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper, we present a novel framework for quality control in cinematic VR (360-video) based on Voronoi patches and saliency which can be used in post-production workflows. Our approach first extracts patches in stereoscopic omnidirectional images (ODI) using the spherical Voronoi diagram. The subdivision of the ODI into patches allows an accurate detection and localization of regions with artifacts. Further, we introduce saliency in order to weight detected artifacts according to the visual attention of end-users. Then, we propose different artifact detection and analysis methods for sharpness mismatch detection (SMD), color mismatch detection (CMD) and disparity distribution analysis. In particular, we took two state of the art approaches for SMD and CMD, which were originally developed for conventional planar images, and extended them to stereoscopic ODIs. Finally, we evaluated the performance of our framework with a dataset of 18 ODIs for which saliency maps were obtained from a subjective test with 17 participants. |
Abreu, Ana De; Ozcinar, Cagri; Smolic, Aljosa Look around you: saliency maps for omnidirectional images in VR applictions Inproceedings In: 9th International Conference on Quality of Multimedia Experience (QoMEX), 2017. @inproceedings{AnaDeAbreuCagriOzcinar2017,
title = {Look around you: saliency maps for omnidirectional images in VR applictions},
author = { Ana De Abreu and Cagri Ozcinar and Aljosa Smolic},
url = {https://www.researchgate.net/publication/317184829_Look_around_you_Saliency_maps_for_omnidirectional_images_in_VR_applications},
year = {2017},
date = {2017-05-31},
booktitle = {9th International Conference on Quality of Multimedia Experience (QoMEX)},
abstract = {Understanding visual attention has always been a topic of great interest in the graphics, image/video processing, robotics and human computer interaction communities. By understanding salient image regions, the compression, transmission and render- ing algorithms can be optimized. This is particularly important in omnidirectional images (ODIs) viewed with a head-mounted display (HMD), where only a fraction of the captured scene is displayed at a time, namely viewport. In order to predict salient image regions, saliency maps are estimated either by using an eye tracker to collect eye fixations during subjective tests or by using computational models of visual attention. However, eye tracking developments for ODIs are still in the early stages and although a large list of saliency models are available, no particular attention has been dedicated to ODIs. Therefore, in this paper, we consider the problem of estimating saliency maps for ODIs viewed with HMDs, when the use of an eye tracker device is not possible. We collected viewport data of 32 participants for 21 ODIs and propose a method to transform the gathered data into saliency maps. The obtained saliency maps are compared in terms of image exposition time used to display each ODI in the subjective tests. Then, motivated by the equator bias tendency in ODIs, we propose a post-processing method, namely FSM, to adapt current saliency models to ODIs requirements. We show that the use of FSM on current models improves their performance by up to 20%. The developed database and testbed are publicly available with this paper.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Understanding visual attention has always been a topic of great interest in the graphics, image/video processing, robotics and human computer interaction communities. By understanding salient image regions, the compression, transmission and render- ing algorithms can be optimized. This is particularly important in omnidirectional images (ODIs) viewed with a head-mounted display (HMD), where only a fraction of the captured scene is displayed at a time, namely viewport. In order to predict salient image regions, saliency maps are estimated either by using an eye tracker to collect eye fixations during subjective tests or by using computational models of visual attention. However, eye tracking developments for ODIs are still in the early stages and although a large list of saliency models are available, no particular attention has been dedicated to ODIs. Therefore, in this paper, we consider the problem of estimating saliency maps for ODIs viewed with HMDs, when the use of an eye tracker device is not possible. We collected viewport data of 32 participants for 21 ODIs and propose a method to transform the gathered data into saliency maps. The obtained saliency maps are compared in terms of image exposition time used to display each ODI in the subjective tests. Then, motivated by the equator bias tendency in ODIs, we propose a post-processing method, namely FSM, to adapt current saliency models to ODIs requirements. We show that the use of FSM on current models improves their performance by up to 20%. The developed database and testbed are publicly available with this paper. |