2020
|
Rossi, Silvia; Ozcinar, Cagri; Smolic, Aljosa; Toni, Laura Do Users Behave Similarly in VR? Investigation of the User Influence on the System Design Journal Article In: ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 2020. @article{Rossi2020,
title = {Do Users Behave Similarly in VR? Investigation of the User Influence on the System Design},
author = {Silvia Rossi and Cagri Ozcinar and Aljosa Smolic and Laura Toni},
url = {https://v-sense.scss.tcd.ie:443/research/3dof/vr_user_behaviour_system_design/},
year = {2020},
date = {2020-02-03},
urldate = {2020-02-03},
journal = {ACM Transactions on Multimedia Computing Communications and Applications (TOMM)},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
Matysiak, Pierre; Grogan, Mairéad; Le Pendu, Mikaël; Alain, Martin; Zerman, Emin; Smolic, Aljosa High Quality Light Field Extraction and Post-Processing for Raw Plenoptic Data Journal Article In: IEEE Transactions on Image Processing, vol. 29, pp. 4188-4203, 2020, ISSN: 1941-0042. @article{Matysiak2020,
title = {High Quality Light Field Extraction and Post-Processing for Raw Plenoptic Data},
author = {Matysiak, Pierre and Grogan, Mairéad and Le Pendu, Mikaël and Alain, Martin and Zerman, Emin and Smolic, Aljosa},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2020/02/TIP_HQLF_authorV_small.pdf},
doi = {10.1109/TIP.2020.2967600},
issn = {1941-0042},
year = {2020},
date = {2020-01-23},
journal = {IEEE Transactions on Image Processing},
volume = {29},
pages = {4188-4203},
abstract = {Light field technology has reached a certain level of maturity in recent years, and its applications in both computer vision research and industry are offering new perspectives for cinematography and virtual reality. Several methods of capture exist, each with its own advantages and drawbacks. One of these methods involves the use of handheld plenoptic cameras. While these cameras offer freedom and ease of use, they also suffer from various visual artefacts and inconsistencies. We propose in this paper an advanced pipeline that enhances their output. After extracting sub-aperture images from the RAW images with our demultiplexing method, we perform three correction steps. We first remove hot pixel artefacts, then correct colour inconsistencies between views using a colour transfer method, and finally we apply a state of the art light field denoising technique to ensure a high image quality. An in-depth analysis is provided for every step of the pipeline, as well as their interaction within the system. We compare our approach to existing state of the art sub-aperture image extracting algorithms, using a number of metrics as well as a subjective experiment. Finally, we showcase the positive impact of our system on a number of relevant light field applications.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Light field technology has reached a certain level of maturity in recent years, and its applications in both computer vision research and industry are offering new perspectives for cinematography and virtual reality. Several methods of capture exist, each with its own advantages and drawbacks. One of these methods involves the use of handheld plenoptic cameras. While these cameras offer freedom and ease of use, they also suffer from various visual artefacts and inconsistencies. We propose in this paper an advanced pipeline that enhances their output. After extracting sub-aperture images from the RAW images with our demultiplexing method, we perform three correction steps. We first remove hot pixel artefacts, then correct colour inconsistencies between views using a colour transfer method, and finally we apply a state of the art light field denoising technique to ensure a high image quality. An in-depth analysis is provided for every step of the pipeline, as well as their interaction within the system. We compare our approach to existing state of the art sub-aperture image extracting algorithms, using a number of metrics as well as a subjective experiment. Finally, we showcase the positive impact of our system on a number of relevant light field applications. |
2019
|
Mairéad Grogan, Aljosa Smolic L2 based Colour Correction for Light Field Arrays Inproceedings In: In Proceedings of the 16th ACM SIGGRAPH European Conference on Visual Media Production, 2019. @inproceedings{Grogan2019b,
title = {L2 based Colour Correction for Light Field Arrays},
author = {Mairéad Grogan, Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/l2-based-colour-correction-for-light-field-arrays/},
year = {2019},
date = {2019-12-17},
urldate = {2019-12-17},
booktitle = {In Proceedings of the 16th ACM SIGGRAPH European Conference on Visual Media Production},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Hudon, Matis; Lutz, Sebastian; Pagés, Rafael; Smolic, Aljosa Augmenting Hand-Drawn Art with Global Illumination Effects through Surface Inflation Conference The 16th ACM SIGGRAPH European Conference on Visual Media Production, 2019. @conference{Hudon2019b,
title = {Augmenting Hand-Drawn Art with Global Illumination Effects through Surface Inflation},
author = {Matis Hudon and Sebastian Lutz and Rafael Pagés and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/augmenting-hand-drawn-art-with-global-illumination-effects-through-surface-inflation/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/12/CVMP2019.pdf},
year = {2019},
date = {2019-12-17},
booktitle = {The 16th ACM SIGGRAPH European Conference on Visual Media Production},
abstract = {We present a method for augmenting hand-drawn characters and
creatures with global illumination effects. Given a single view draw-
ing only, we use a novel CNN to predict a high-quality normal map
of the same resolution. The predicted normals are then used as
guide to inflate a surface into a 3D proxy mesh visually consistent
and suitable to augment the input 2D art with convincing global
illumination effects while keeping the hand-drawn look and feel. Along with this paper, a new high resolution dataset of line drawings with corresponding ground-truth normal and depth maps will be shared. We validate our CNN, comparing our neural predictions qualitatively and quantitatively with the recent state-of-the art, show results for various hand-drawn images and animations, and compare with alternative modeling approaches.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a method for augmenting hand-drawn characters and
creatures with global illumination effects. Given a single view draw-
ing only, we use a novel CNN to predict a high-quality normal map
of the same resolution. The predicted normals are then used as
guide to inflate a surface into a 3D proxy mesh visually consistent
and suitable to augment the input 2D art with convincing global
illumination effects while keeping the hand-drawn look and feel. Along with this paper, a new high resolution dataset of line drawings with corresponding ground-truth normal and depth maps will be shared. We validate our CNN, comparing our neural predictions qualitatively and quantitatively with the recent state-of-the art, show results for various hand-drawn images and animations, and compare with alternative modeling approaches. |
Fearghail, Colm O; Knorr, Sebastian; Smolic, Aljosa Analysis of Intended Viewing Area vs Estimated Saliency on Narrative Plot Structures in VR Film Conference International Conference on 3D Immersion, 2019. @conference{Fearghail2019,
title = {Analysis of Intended Viewing Area vs Estimated Saliency on Narrative Plot Structures in VR Film},
author = {Colm O Fearghail and Sebastian Knorr and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/?attachment_id=4339},
year = {2019},
date = {2019-12-11},
booktitle = {International Conference on 3D Immersion},
abstract = {In cinematic virtual reality one of the primary challenges from a storytelling perceptive is that of leading the attention of the viewers to ensure that the narrative is understood as desired. Methods from traditional cinema have been applied to varying levels of success. This paper explores the use of a saliency convolutional neural network model and measures it’s results against the intending viewing area as denoted by the creators and the ground truth as to where the viewers actually looked. This information could then be used to further increase the effectiveness of a director’s ability to focus attention in cinematic VR.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In cinematic virtual reality one of the primary challenges from a storytelling perceptive is that of leading the attention of the viewers to ensure that the narrative is understood as desired. Methods from traditional cinema have been applied to varying levels of success. This paper explores the use of a saliency convolutional neural network model and measures it’s results against the intending viewing area as denoted by the creators and the ground truth as to where the viewers actually looked. This information could then be used to further increase the effectiveness of a director’s ability to focus attention in cinematic VR. |
Gao, Pan; Smolic, Aljosa Occlusion-Aware Depth Map Coding Optimization Using Allowable Depth Map Distortions Journal Article In: IEEE Transactions on Image Processing, vol. 28, no. 11, pp. 5266-5280, 2019, ISBN: 1941-0042. @article{Gao2019b,
title = {Occlusion-Aware Depth Map Coding Optimization Using Allowable Depth Map Distortions},
author = {Pan Gao and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/08/Occlusion-Aware-Depth-Map-Coding-Optimization-Using-Allowable-Depth-Map-Distortions-..pdf},
doi = {10.1109/TIP.2019.2919198},
isbn = {1941-0042},
year = {2019},
date = {2019-11-19},
journal = { IEEE Transactions on Image Processing},
volume = {28},
number = {11},
pages = {5266-5280},
abstract = {In depth map coding, rate-distortion optimization for those pixels that will cause occlusion in view synthesis is a rather challenging task, since the synthesis distortion estimation is complicated by the warping competition and the occlusion order can be easily changed by the adopted optimization strategy. In this paper, an efficient depth map coding approach using allowable depth map distortions is proposed for occlusion-inducing pixels. First, we derive the range of allowable depth level change for both the zero disparity error case and non-zero disparity error case with theoretic and geometrical proofs. Then, we formulate the problem of optimally selecting the depth distortion within allowable depth distortion range with the objective to minimize the overall synthesis distortion involved in the occlusion. The unicity and occlusion order invariance properties of allowable depth distortion range is demonstrated. Finally, we propose a dynamic programming based algorithm to locate the optimal depth distortion for each pixel. Simulation results illustrate the performance improvement of the proposed algorithm over the other state-of-the-art depth map coding optimization schemes.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
In depth map coding, rate-distortion optimization for those pixels that will cause occlusion in view synthesis is a rather challenging task, since the synthesis distortion estimation is complicated by the warping competition and the occlusion order can be easily changed by the adopted optimization strategy. In this paper, an efficient depth map coding approach using allowable depth map distortions is proposed for occlusion-inducing pixels. First, we derive the range of allowable depth level change for both the zero disparity error case and non-zero disparity error case with theoretic and geometrical proofs. Then, we formulate the problem of optimally selecting the depth distortion within allowable depth distortion range with the objective to minimize the overall synthesis distortion involved in the occlusion. The unicity and occlusion order invariance properties of allowable depth distortion range is demonstrated. Finally, we propose a dynamic programming based algorithm to locate the optimal depth distortion for each pixel. Simulation results illustrate the performance improvement of the proposed algorithm over the other state-of-the-art depth map coding optimization schemes. |
Trottnow, Jonas; Spielmann, Simon; Herfet, Thorsten; Lange, Tobias; Chelli, Kelvin; Solony, Marek; Smrz, Pavel; Zemcik, Pavel; Aenchbacher, Weston; Grogan, Mairéad; Alain, Martin; Smolic, Aljosa; Canham, Trevor; Vu-Thanh, Olivier; Vázquez-Corral, Javier; Bertalmío, Marcelo The Potential of Light Fields in Media Productions Inproceedings In: SIGGRAPH Asia Technical Briefs, Association for Computing Machinery, 2019. @inproceedings{Trottnow2019,
title = {The Potential of Light Fields in Media Productions},
author = {Jonas Trottnow and Simon Spielmann and Thorsten Herfet and Tobias Lange and Kelvin Chelli and Marek Solony and Pavel Smrz and Pavel Zemcik and Weston Aenchbacher and Mairéad Grogan and Martin Alain and Aljosa Smolic and Trevor Canham and Olivier Vu-Thanh and Javier Vázquez-Corral and Marcelo Bertalmío},
url = {https://v-sense.scss.tcd.ie:443/research/light-fields/the-potential-of-light-fields-in-media-productions/},
year = {2019},
date = {2019-11-17},
booktitle = {SIGGRAPH Asia Technical Briefs},
publisher = {Association for Computing Machinery},
abstract = {One aspect of the EU funded project SAUCE is to explore the possibilities and challenges of integrating light field capturing and processing into media productions. A special light field camera was build by Saarland University and is first tested under production conditions in the test production ``Unfolding'' as part of the SAUCE project. Filmakademie Baden-Württemberg developed the contentual frame, executed the post-production and prepared a complete previsualization. Calibration and post-processing algorithms are developed by the Trinity College Dublin and the Brno University of Technology. This document describes challenges during building and shooting with the light field camera array, as well as its potential and challenges for the post-production.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
One aspect of the EU funded project SAUCE is to explore the possibilities and challenges of integrating light field capturing and processing into media productions. A special light field camera was build by Saarland University and is first tested under production conditions in the test production ``Unfolding'' as part of the SAUCE project. Filmakademie Baden-Württemberg developed the contentual frame, executed the post-production and prepared a complete previsualization. Calibration and post-processing algorithms are developed by the Trinity College Dublin and the Brno University of Technology. This document describes challenges during building and shooting with the light field camera array, as well as its potential and challenges for the post-production. |
Chalasani, Tejo; Smolic, Aljosa Simultaneous Segmentation and Recognition: Towards more accurate Ego Gesture Recognition Workshop ICCV Workshop on Egocentric Perception and Computing, IEEE, 2019. @workshop{ssar_tejo_2019,
title = {Simultaneous Segmentation and Recognition: Towards more accurate Ego Gesture Recognition},
author = {Tejo Chalasani and Aljosa Smolic},
url = {https://arxiv.org/pdf/1909.08606},
year = {2019},
date = {2019-11-02},
urldate = {2019-11-02},
booktitle = {ICCV Workshop on Egocentric Perception and Computing},
publisher = {IEEE},
abstract = {Ego hand gestures can be used as an interface in AR and VR environments. While the context of an image is important for tasks like scene understanding, object recognition, image caption generation and activity recognition, it plays a minimal role in ego hand gesture recognition. An ego hand gesture used for AR and VR environments conveys the same information regardless of the background. With this idea in mind, we present our work on ego hand gesture recognition that produces embeddings from RBG images with ego hands, which are simultaneously used for ego hand segmentation and ego gesture recognition. To this extent, we achieved better recognition accuracy (96.9%) compared to the state of the art (92.2%) on the biggest ego hand gesture dataset available publicly. We present a gesture recognition deep neural network which recognises ego hand gestures from videos (videos containing a single gesture) by generating and recognising embeddings of ego hands from image sequences of varying length. We introduce the concept of simultaneous segmentation and recognition applied to ego hand gestures, present the network architecture, the training procedure and the results compared to the state of the art on the EgoGesture dataset.},
keywords = {},
pubstate = {published},
tppubtype = {workshop}
}
Ego hand gestures can be used as an interface in AR and VR environments. While the context of an image is important for tasks like scene understanding, object recognition, image caption generation and activity recognition, it plays a minimal role in ego hand gesture recognition. An ego hand gesture used for AR and VR environments conveys the same information regardless of the background. With this idea in mind, we present our work on ego hand gesture recognition that produces embeddings from RBG images with ego hands, which are simultaneously used for ego hand segmentation and ego gesture recognition. To this extent, we achieved better recognition accuracy (96.9%) compared to the state of the art (92.2%) on the biggest ego hand gesture dataset available publicly. We present a gesture recognition deep neural network which recognises ego hand gestures from videos (videos containing a single gesture) by generating and recognising embeddings of ego hands from image sequences of varying length. We introduce the concept of simultaneous segmentation and recognition applied to ego hand gestures, present the network architecture, the training procedure and the results compared to the state of the art on the EgoGesture dataset. |
Ghosal, Koustav; Rana, Aakanksha; Smolic, Aljosa Aesthetic Image Captioning from Weakly-Labelled Photographs Inproceedings In: ICCV 2019 Workshop on Cross-Modal Learning in Real World, 2019. @inproceedings{Ghosal_iccv2019,
title = {Aesthetic Image Captioning from Weakly-Labelled Photographs},
author = {Koustav Ghosal and Aakanksha Rana and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/08/ICCVW_CROMOL_2019.pdf},
year = {2019},
date = {2019-10-27},
urldate = {2019-10-27},
booktitle = {ICCV 2019 Workshop on Cross-Modal Learning in Real World},
abstract = {Aesthetic image captioning (AIC) refers to the multi-modal task of generating critical textual feedbacks for photographs. While in natural image captioning (NIC), deep
models are trained in an end-to-end manner using large curated datasets such as MS-COCO, no such large-scale, clean dataset exists for AIC. Towards this goal, we propose
an automatic cleaning strategy to create a benchmarking AIC dataset, by exploiting the images and noisy comments easily available from photography websites. We propose a
probabilistic caption-filtering method for cleaning the noisy web-data, and compile a large-scale, clean dataset ‘AVA-Captions’, ( ∼230, 000 images with ∼5 captions per image). Additionally, by exploiting the latent associations between aesthetic attributes, we propose a strategy for training a convolutional neural network (CNN) based visual feature extractor, typically the first component of an AIC frame-work. The strategy is weakly supervised and can be effectively used to learn rich aesthetic representations, without requiring expensive ground-truth annotations. We finally
showcase a thorough analysis of the proposed contributions using automatic metrics and subjective evaluations.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Aesthetic image captioning (AIC) refers to the multi-modal task of generating critical textual feedbacks for photographs. While in natural image captioning (NIC), deep
models are trained in an end-to-end manner using large curated datasets such as MS-COCO, no such large-scale, clean dataset exists for AIC. Towards this goal, we propose
an automatic cleaning strategy to create a benchmarking AIC dataset, by exploiting the images and noisy comments easily available from photography websites. We propose a
probabilistic caption-filtering method for cleaning the noisy web-data, and compile a large-scale, clean dataset ‘AVA-Captions’, ( ∼230, 000 images with ∼5 captions per image). Additionally, by exploiting the latent associations between aesthetic attributes, we propose a strategy for training a convolutional neural network (CNN) based visual feature extractor, typically the first component of an AIC frame-work. The strategy is weakly supervised and can be effectively used to learn rich aesthetic representations, without requiring expensive ground-truth annotations. We finally
showcase a thorough analysis of the proposed contributions using automatic metrics and subjective evaluations. |
Alain, Martin; Aenchbacher, Weston; Smolic, Aljosa Interactive Light Field Tilt-Shift Refocus with Generalized Shift-and-Sum Inproceedings In: Proceedings of European Light Field Imaging Workshop, 2019. @inproceedings{Alain2019b,
title = {Interactive Light Field Tilt-Shift Refocus with Generalized Shift-and-Sum},
author = {Martin Alain and Weston Aenchbacher and Aljosa Smolic},
url = {https://arxiv.org/pdf/1910.04699.pdf},
year = {2019},
date = {2019-10-04},
booktitle = {Proceedings of European Light Field Imaging Workshop},
abstract = {Since their introduction more than two decades ago, light fields have gained considerable interest in graphics and vision communities due to their ability to provide the user with interactive visual content. One of the earliest and most common light field operations is digital refocus, enabling the user to choose the focus and depth-of-field for the image after capture. A common interactive method for such an operation utilizes disparity estimations, readily available from the light field, to allow the user to point-and-click on the image to chose the location of the refocus plane.
In this paper, we address the interactivity of a lesser-known light field operation: refocus to a non-frontoparallel plane, simulating the result of traditional tilt-shift photography. For this purpose we introduce a generalized shift-and-sum framework. Further, we show that the inclusion of depth information allows for intuitive interactive methods for placement of the refocus plane. In addition to refocusing, light fields also enable the user to interact with the viewpoint, which can be easily included in the proposed generalized shift-and-sum framework},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Since their introduction more than two decades ago, light fields have gained considerable interest in graphics and vision communities due to their ability to provide the user with interactive visual content. One of the earliest and most common light field operations is digital refocus, enabling the user to choose the focus and depth-of-field for the image after capture. A common interactive method for such an operation utilizes disparity estimations, readily available from the light field, to allow the user to point-and-click on the image to chose the location of the refocus plane.
In this paper, we address the interactivity of a lesser-known light field operation: refocus to a non-frontoparallel plane, simulating the result of traditional tilt-shift photography. For this purpose we introduce a generalized shift-and-sum framework. Further, we show that the inclusion of depth information allows for intuitive interactive methods for placement of the refocus plane. In addition to refocusing, light fields also enable the user to interact with the viewpoint, which can be easily included in the proposed generalized shift-and-sum framework |
Ozcinar*, Cagri; Rana*, Aakanksha; Smolic, Aljosa Super-resolution of Omnidirectional Images Using Adversarial Learning Inproceedings In: IEEE 21st International Workshop on Multimedia Signal Processing (MMSP 2019), 2019. @inproceedings{Ozcinar-Rana*2019,
title = {Super-resolution of Omnidirectional Images Using Adversarial Learning},
author = {Cagri Ozcinar* and Aakanksha Rana* and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/09/mmsp_sr_2019.pdf
https://www.researchgate.net/publication/334612203_Super-resolution_of_Omnidirectional_Images_Using_Adversarial_Learning},
year = {2019},
date = {2019-09-27},
booktitle = {IEEE 21st International Workshop on Multimedia Signal Processing (MMSP 2019)},
abstract = {An omnidirectional image (ODI) enables viewers to look in every direction from a fixed point through a head-mounted display providing an immersive experience compared to that of a standard image. Designing immersive virtual reality systems with ODIs is challenging as they require high resolution content. In this paper, we study super-resolution for ODIs and propose an improved generative adversarial network based model which is optimized to handle the artifacts obtained in the spherical observational space. Specifically, we propose to use a fast PatchGAN discriminator, as it needs fewer parameters and improves the super-resolution at a fine scale. We also explore the generative models with adversarial learning by introducing a spherical-content specific loss function, called 360-SS. To train and test the performance of our proposed model we prepare a dataset of 4500 ODIs. Our results demonstrate the efficacy of the proposed method and identify new challenges in ODI super-resolution for future investigations.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
An omnidirectional image (ODI) enables viewers to look in every direction from a fixed point through a head-mounted display providing an immersive experience compared to that of a standard image. Designing immersive virtual reality systems with ODIs is challenging as they require high resolution content. In this paper, we study super-resolution for ODIs and propose an improved generative adversarial network based model which is optimized to handle the artifacts obtained in the spherical observational space. Specifically, we propose to use a fast PatchGAN discriminator, as it needs fewer parameters and improves the super-resolution at a fine scale. We also explore the generative models with adversarial learning by introducing a spherical-content specific loss function, called 360-SS. To train and test the performance of our proposed model we prepare a dataset of 4500 ODIs. Our results demonstrate the efficacy of the proposed method and identify new challenges in ODI super-resolution for future investigations. |
Alain, Martin; Ozcinar, Cagri; Smolic, Aljosa A Study of Light Field Streaming for an Interactive Refocusing Application Inproceedings In: 2019 The International Conference on Image Processing (IEEE ICIP 2019), 2019. @inproceedings{Alain2019,
title = {A Study of Light Field Streaming for an Interactive Refocusing Application},
author = {Martin Alain and Cagri Ozcinar and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/light-fields/lf-streaming/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/05/LF_streaming.pdf},
year = {2019},
date = {2019-09-25},
booktitle = {2019 The International Conference on Image Processing (IEEE ICIP 2019)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Dib, Elian; Le Pendu, Mikael; Guillemot, Christine Light Field Compression using Fourier Disparity Layers Conference IEEE International Conference on Image Processing (ICIP 2019), 2019. @conference{Dib2019,
title = {Light Field Compression using Fourier Disparity Layers},
author = {Elian Dib and Le Pendu, Mikael and Christine Guillemot},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/05/FDL_compression.pdf},
year = {2019},
date = {2019-09-25},
booktitle = {IEEE International Conference on Image Processing (ICIP 2019)},
abstract = {In this paper, we present a compression method for light fields based on the Fourier Disparity Layer representation. This light field representation consists in a set of layers that can be efficiently constructed in the Fourier domain from a sparse set of views, and then used to reconstruct intermediate viewpoints without requiring a disparity map. In the proposed compression scheme, a subset of light field views is encoded first and used to construct a Fourier Disparity Layer model from which a second subset of views is predicted. After encoding and decoding the residual of those predicted views, a larger set of decoded views is available, allowing us to refine the layer model in order to predict the next views with increased accuracy. The procedure is repeated until the complete set of light field views is encoded. Following this principle, we investigate in the paper different scanning orders of the light field views and analyse their respective efficiencies regarding the compression performance.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In this paper, we present a compression method for light fields based on the Fourier Disparity Layer representation. This light field representation consists in a set of layers that can be efficiently constructed in the Fourier domain from a sparse set of views, and then used to reconstruct intermediate viewpoints without requiring a disparity map. In the proposed compression scheme, a subset of light field views is encoded first and used to construct a Fourier Disparity Layer model from which a second subset of views is predicted. After encoding and decoding the residual of those predicted views, a larger set of decoded views is available, allowing us to refine the layer model in order to predict the next views with increased accuracy. The procedure is repeated until the complete set of light field views is encoded. Following this principle, we investigate in the paper different scanning orders of the light field views and analyse their respective efficiencies regarding the compression performance. |
Zerman*, Emin; Rana*, Aakanksha; Smolic., Aljosa ColorNet - Estimating Colorfulness in Natural Images Inproceedings In: zThe International Conference on Image Processing (IEEE ICIP 2019), IEEE 2019. @inproceedings{zerman2019colornet,
title = {ColorNet - Estimating Colorfulness in Natural Images},
author = {Emin Zerman* and Aakanksha Rana* and Aljosa Smolic.},
url = {https://v-sense.scss.tcd.ie:443/research/deep-learning/colornet-estimating-colorfulness/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/05/ColorNet_ICIP2019_preprint.pdf},
doi = {10.1109/ICIP.2019.8803407},
year = {2019},
date = {2019-09-24},
booktitle = {zThe International Conference on Image Processing (IEEE ICIP 2019)},
organization = {IEEE},
abstract = {Measuring the colorfulness of a natural or virtual scene is critical for many applications in image processing field ranging from capturing to display. In this paper, we propose the first deep learning-based colorfulness estimation metric. For this purpose, we develop a color rating model which simultaneously learns to extracts the pertinent characteristic color features and the mapping from feature space to the ideal colorfulness scores for a variety of natural colored images. Additionally, we propose to overcome the lack of adequate annotated dataset problem by combining/aligning two publicly available colorfulness databases using the results of a new subjective test which employs a common subset of both databases. Using the obtained subjectively annotated dataset with 180 colored images, we finally demonstrate the efficacy of our proposed model over the traditional methods, both quantitatively and qualitatively},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Measuring the colorfulness of a natural or virtual scene is critical for many applications in image processing field ranging from capturing to display. In this paper, we propose the first deep learning-based colorfulness estimation metric. For this purpose, we develop a color rating model which simultaneously learns to extracts the pertinent characteristic color features and the mapping from feature space to the ideal colorfulness scores for a variety of natural colored images. Additionally, we propose to overcome the lack of adequate annotated dataset problem by combining/aligning two publicly available colorfulness databases using the results of a new subjective test which employs a common subset of both databases. Using the obtained subjectively annotated dataset with 180 colored images, we finally demonstrate the efficacy of our proposed model over the traditional methods, both quantitatively and qualitatively |
Zolanvari, S. M. Iman; Ruano, Susana; Rana, Aakanksha; Cummins, Alan; da Silva, Rogerio Eduardo; Rahbar, Morteza; Smolic, Aljosa DublinCity: Annotated LiDAR Point Cloud and its Applications Conference Forthcoming BMVC 30th British Machine Vision Conference, Forthcoming. @conference{Zolanvari2019,
title = {DublinCity: Annotated LiDAR Point Cloud and its Applications},
author = {S.M. Iman Zolanvari and Susana Ruano and Aakanksha Rana and Alan Cummins and Rogerio Eduardo da Silva and Morteza Rahbar and Aljosa Smolic},
url = {https://www.researchgate.net/publication/334612380_DublinCity_Annotated_LiDAR_Point_Cloud_and_its_Applications
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/08/BMVC_2019_PointCloud__Copy_-2.pdf},
year = {2019},
date = {2019-09-09},
publisher = {30th British Machine Vision Conference},
organization = {BMVC},
abstract = {Scene understanding of full-scale 3D models of an urban area remains a challenging task. While advanced computer vision techniques offer cost-effective approaches to analyse 3D urban elements, a precise and densely labelled dataset is quintessential. The paper presents the first-ever labelled dataset for a highly dense Aerial Laser Scanning (ALS) point cloud at city-scale. This work introduces a novel benchmark dataset that includes a manually annotated point cloud for over 260 million laser scanning points into 100'000 (approx.) assets from Dublin LiDAR point cloud cite{laefer20172015} in 2015. Objects are labelled into 13 classes using hierarchical levels of detail from large (ie building, vegetation and ground) to refined (ie window, door and tree) elements. To validate the performance of our dataset, two different applications are showcased. Firstly, the labelled point cloud is employed for training Convolutional Neural Networks (CNNs) to classify urban elements. The dataset is tested on the well-known state-of-the-art CNNs (ie PointNet, PointNet++ and So-Net). Secondly, the complete ALS dataset is applied as detailed ground truth for city-scale image-based 3D reconstruction.},
keywords = {},
pubstate = {forthcoming},
tppubtype = {conference}
}
Scene understanding of full-scale 3D models of an urban area remains a challenging task. While advanced computer vision techniques offer cost-effective approaches to analyse 3D urban elements, a precise and densely labelled dataset is quintessential. The paper presents the first-ever labelled dataset for a highly dense Aerial Laser Scanning (ALS) point cloud at city-scale. This work introduces a novel benchmark dataset that includes a manually annotated point cloud for over 260 million laser scanning points into 100'000 (approx.) assets from Dublin LiDAR point cloud cite{laefer20172015} in 2015. Objects are labelled into 13 classes using hierarchical levels of detail from large (ie building, vegetation and ground) to refined (ie window, door and tree) elements. To validate the performance of our dataset, two different applications are showcased. Firstly, the labelled point cloud is employed for training Convolutional Neural Networks (CNNs) to classify urban elements. The dataset is tested on the well-known state-of-the-art CNNs (ie PointNet, PointNet++ and So-Net). Secondly, the complete ALS dataset is applied as detailed ground truth for city-scale image-based 3D reconstruction. |
Hana Alghamdi, Mairéad Grogan; Dahyot, Rozenn Patch-Based Colour Transfer with Optimal Transport Conference Forthcoming European Signal Processing Conference (Eusipco) 2019, Forthcoming. @conference{Alghamdi2019,
title = {Patch-Based Colour Transfer with Optimal Transport},
author = {Hana Alghamdi, Mairéad Grogan and Rozenn Dahyot},
url = {https://v-sense.scss.tcd.ie:443/research/vfx-animation/patch-based-colour-transfer-with-optimal-transport/},
year = {2019},
date = {2019-09-02},
booktitle = {European Signal Processing Conference (Eusipco) 2019},
abstract = {This paper proposes a new colour transfer method with Optimal transport to transfer the colour of a source image to match the colour of a target image of the same scene. We propose to formulate the problem in higher dimensional spaces (than colour spaces) by encoding overlapping neighborhoods of pixels containing colour information as well as spatial information. Since several recoloured candidates are now generated for each pixel in the source image, we define an original procedure to efficiently merge these candidates which allows denoising and artifact removal as well as colour transfer. Experiments show quantitative and qualitative improvements over previous colour transfer methods. Our method can be applied to different contexts of colour transfer such as transferring colour between different camera models, camera settings, illumination conditions and colour retouch styles for photographs.},
keywords = {},
pubstate = {forthcoming},
tppubtype = {conference}
}
This paper proposes a new colour transfer method with Optimal transport to transfer the colour of a source image to match the colour of a target image of the same scene. We propose to formulate the problem in higher dimensional spaces (than colour spaces) by encoding overlapping neighborhoods of pixels containing colour information as well as spatial information. Since several recoloured candidates are now generated for each pixel in the source image, we define an original procedure to efficiently merge these candidates which allows denoising and artifact removal as well as colour transfer. Experiments show quantitative and qualitative improvements over previous colour transfer methods. Our method can be applied to different contexts of colour transfer such as transferring colour between different camera models, camera settings, illumination conditions and colour retouch styles for photographs. |
Rana, Aakanksha; Singh, Praveer; Valenzise, Giuseppe; Dufaux, Frederic; Komodakis, Nikos; Smolic, Aljosa Deep Tone Mapping Operator for High Dynamic Range Images Journal Article In: IEEE Transaction of Image Processing, vol. 29, pp. 1285-1298, 2019, ISBN: 1057-7149. @article{Rana2019TIP,
title = {Deep Tone Mapping Operator for High Dynamic Range Images},
author = {Aakanksha Rana and Praveer Singh and Giuseppe Valenzise and Frederic Dufaux and Nikos Komodakis and Aljosa Smolic},
url = {https://arxiv.org/abs/1908.04197},
doi = {10.1109/TIP.2019.2936649},
isbn = {1057-7149},
year = {2019},
date = {2019-09-02},
journal = {IEEE Transaction of Image Processing},
volume = {29},
pages = {1285-1298},
abstract = {A computationally fast tone mapping operator
(TMO) that can quickly adapt to a wide spectrum of high
dynamic range (HDR) content is quintessential for visualization
on varied low dynamic range (LDR) output devices such as movie
screens or standard displays. Existing TMOs can successfully
tone-map only a limited number of HDR content and require an
extensive parameter tuning to yield the best subjective-quality
tone-mapped output. In this paper, we address this problem by
proposing a fast, parameter-free and scene-adaptable deep tone
mapping operator (DeepTMO) that yields a high-resolution and
high-subjective quality tone mapped output. Based on conditional
generative adversarial network (cGAN), DeepTMO not only
learns to adapt to vast scenic-content (e.g., outdoor, indoor,
human, structures, etc.) but also tackles the HDR related scene-
specific challenges such as contrast and brightness, while preserv-
ing the fine-grained details. We explore 4 possible combinations
of Generator-Discriminator architectural designs to specifically
address some prominent issues in HDR related deep-learning
frameworks like blurring, tiling patterns and saturation artifacts.
By exploring different influences of scales, loss-functions and
normalization layers under a cGAN setting, we conclude with
adopting a multi-scale model for our task. To further leverage
on the large-scale availability of unlabeled HDR data, we train
our network by generating targets using an objective HDR quality
metric, namely Tone Mapping Image Quality Index (TMQI).
We demonstrate results both quantitatively and qualitatively,
and showcase that our DeepTMO generates high-resolution,
high-quality output images over a large spectrum of real-world
scenes. Finally, we evaluate the perceived quality of our results
by conducting a pair-wise subjective study which confirms the
versatility of our method.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
A computationally fast tone mapping operator
(TMO) that can quickly adapt to a wide spectrum of high
dynamic range (HDR) content is quintessential for visualization
on varied low dynamic range (LDR) output devices such as movie
screens or standard displays. Existing TMOs can successfully
tone-map only a limited number of HDR content and require an
extensive parameter tuning to yield the best subjective-quality
tone-mapped output. In this paper, we address this problem by
proposing a fast, parameter-free and scene-adaptable deep tone
mapping operator (DeepTMO) that yields a high-resolution and
high-subjective quality tone mapped output. Based on conditional
generative adversarial network (cGAN), DeepTMO not only
learns to adapt to vast scenic-content (e.g., outdoor, indoor,
human, structures, etc.) but also tackles the HDR related scene-
specific challenges such as contrast and brightness, while preserv-
ing the fine-grained details. We explore 4 possible combinations
of Generator-Discriminator architectural designs to specifically
address some prominent issues in HDR related deep-learning
frameworks like blurring, tiling patterns and saturation artifacts.
By exploring different influences of scales, loss-functions and
normalization layers under a cGAN setting, we conclude with
adopting a multi-scale model for our task. To further leverage
on the large-scale availability of unlabeled HDR data, we train
our network by generating targets using an objective HDR quality
metric, namely Tone Mapping Image Quality Index (TMQI).
We demonstrate results both quantitatively and qualitatively,
and showcase that our DeepTMO generates high-resolution,
high-quality output images over a large spectrum of real-world
scenes. Finally, we evaluate the perceived quality of our results
by conducting a pair-wise subjective study which confirms the
versatility of our method. |
Perez-Ortiz, Maria; Mikhailiuk, Aliaksei; Zerman, Emin; Hulusic, Vedad; Valenzise, Giuseppe; Mantiuk, Rafal From pairwise comparisons and rating to a unified quality scale Journal Article In: IEEE Transactions on Image Processing, vol. 29, 2019. @article{perezOrtiz2019from,
title = {From pairwise comparisons and rating to a unified quality scale},
author = {Maria Perez-Ortiz and Aliaksei Mikhailiuk and Emin Zerman and Vedad Hulusic and Giuseppe Valenzise and Rafal Mantiuk},
doi = {10.1109/TIP.2019.2936103},
year = {2019},
date = {2019-08-28},
journal = {IEEE Transactions on Image Processing},
volume = {29},
abstract = {The goal of psychometric scaling is the quantification of perceptual experiences, understanding the relationship between an external stimulus, the internal representation and the response. In this paper, we propose a probabilistic framework to fuse the outcome of different psychophysical experimental protocols, namely rating and pairwise comparisons experiments. Such a method can be used for merging existing datasets of subjective nature and for experiments in which both measurements are collected. We analyze and compare the outcomes of both types of experimental protocols in terms of time and accuracy in a set of simulations and experiments with benchmark and real-world image quality assessment datasets, showing the necessity of scaling and the advantages of each protocol and mixing. Although most of our examples focus on image quality assessment, our findings generalize to any other subjective quality-of-experience task.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
The goal of psychometric scaling is the quantification of perceptual experiences, understanding the relationship between an external stimulus, the internal representation and the response. In this paper, we propose a probabilistic framework to fuse the outcome of different psychophysical experimental protocols, namely rating and pairwise comparisons experiments. Such a method can be used for merging existing datasets of subjective nature and for experiments in which both measurements are collected. We analyze and compare the outcomes of both types of experimental protocols in terms of time and accuracy in a set of simulations and experiments with benchmark and real-world image quality assessment datasets, showing the necessity of scaling and the advantages of each protocol and mixing. Although most of our examples focus on image quality assessment, our findings generalize to any other subjective quality-of-experience task. |
Lutz, Sebastian; Davey, Mark; Smolic., Aljosa Deep Convolutional Neural Networks for estimating lens distortion parameters Conference Irish Machine Vision and Image Processing Conference (IMVIP) 2019, 2019. @conference{Lutz2019,
title = {Deep Convolutional Neural Networks for estimating lens distortion parameters},
author = {Sebastian Lutz and Mark Davey and Aljosa Smolic. },
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/08/IMVIP___Lens_distortion1.pdf},
year = {2019},
date = {2019-08-28},
booktitle = {Irish Machine Vision and Image Processing Conference (IMVIP) 2019},
abstract = {In this paper we present a convolutional neural network (CNN) to predict multiple lens distortion parameters from a single input image. Unlike other methods, our network is suitable to create high resolution output as it directly estimates the parameters from the image which then can be used to rectify even very high resolution input images. As our method it is fully automatic, it is suitable for both casual creatives and professional artists. Our results show that our network accurately predicts the lens distortion parameters of high resolution images and corrects the distortions satisfactory.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In this paper we present a convolutional neural network (CNN) to predict multiple lens distortion parameters from a single input image. Unlike other methods, our network is suitable to create high resolution output as it directly estimates the parameters from the image which then can be used to rectify even very high resolution input images. As our method it is fully automatic, it is suitable for both casual creatives and professional artists. Our results show that our network accurately predicts the lens distortion parameters of high resolution images and corrects the distortions satisfactory. |
Vo, Anh Vu; Laefer, Debra F.; Smolic, Aljosa; Zolanvari, S. M. Iman Per-point processing for detailed urban solar estimation with aerial laser scanning and distributed computing Journal Article In: ISPRS Journal of Photogrammetry and Remote Sensing, vol. 155, pp. 119-135, 2019. @article{Vo2019,
title = {Per-point processing for detailed urban solar estimation with aerial laser scanning and distributed computing},
author = {Anh Vu Vo and Debra F. Laefer and Aljosa Smolic and S.M. Iman Zolanvari},
url = {http://www.sciencedirect.com/science/article/pii/S0924271619301510
http://bit.ly/ISPRS_3
},
doi = {https://doi.org/10.1016/j.isprsjprs.2019.06.009},
year = {2019},
date = {2019-07-15},
journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
volume = {155},
pages = {119-135},
abstract = {This paper presents a complete data processing pipeline for improved urban solar potential estimation by applying solar irradiation estimation directly to individual aerial laser scanning (ALS) points in a distributed computing environment. Solar potential is often measured by solar irradiation – the amount of the Sun’s radiant energy received at the Earth’s surface over a period of time. To overcome previous limits of solar radiation estimations based on either two-and-a-half-dimensional raster models or overly simplistic, manually-generated, geometric models, an alternative approach is proposed using dense, urban aerial laser scanning data to enable the incorporation of the true, complex, and heterogeneous elements common in most urban areas. The approach introduces a direct, per-point analysis to fully exploit all details provided by the input point cloud data. To address the resulting computational demands required by the thousands of calculations needed per point for a full-year analysis, a distributed data processing strategy is employed that introduces an atypical data partition strategy. The scalability and performance of the approach are demonstrated on a 1.4-billion-point dataset covering more than 2 km2 of Dublin, Ireland. The reliability and realism of the simulation results are rigorously confirmed with (1) an aerial image collected concurrently with the laser scanning, (2) a terrestrial image acquired from an online source, and (3) a four-day, direct solar radiation collection experiment.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
This paper presents a complete data processing pipeline for improved urban solar potential estimation by applying solar irradiation estimation directly to individual aerial laser scanning (ALS) points in a distributed computing environment. Solar potential is often measured by solar irradiation – the amount of the Sun’s radiant energy received at the Earth’s surface over a period of time. To overcome previous limits of solar radiation estimations based on either two-and-a-half-dimensional raster models or overly simplistic, manually-generated, geometric models, an alternative approach is proposed using dense, urban aerial laser scanning data to enable the incorporation of the true, complex, and heterogeneous elements common in most urban areas. The approach introduces a direct, per-point analysis to fully exploit all details provided by the input point cloud data. To address the resulting computational demands required by the thousands of calculations needed per point for a full-year analysis, a distributed data processing strategy is employed that introduces an atypical data partition strategy. The scalability and performance of the approach are demonstrated on a 1.4-billion-point dataset covering more than 2 km2 of Dublin, Ireland. The reliability and realism of the simulation results are rigorously confirmed with (1) an aerial image collected concurrently with the laser scanning, (2) a terrestrial image acquired from an online source, and (3) a four-day, direct solar radiation collection experiment. |
Zerman, Emin; Valenzise, Giuseppe; Smolic, Aljosa Analysing the Impact of Cross-Content Pairs on Pairwise Comparison Scaling Inproceedings In: 11th International Conference on Quality of Multimedia Experience (QoMEX 2019), IEEE 2019. @inproceedings{zerman2019analysing,
title = {Analysing the Impact of Cross-Content Pairs on Pairwise Comparison Scaling},
author = {Emin Zerman and Giuseppe Valenzise and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/03/qomex2019_crossContent_preprint.pdf},
doi = {10.1109/QoMEX.2019.8743295},
year = {2019},
date = {2019-06-06},
booktitle = {11th International Conference on Quality of Multimedia Experience (QoMEX 2019)},
organization = {IEEE},
abstract = {Pairwise comparisons (PWC) methodology is one of the most commonly used methods for subjective quality assessment, especially for computer graphics and multimedia applications. Unlike rating methods, a psychometric scaling operation is required to convert PWC results to numerical subjective quality values. Due to the nature of this scaling operation, the obtained quality scores are relative to the set they are computed in. While it is customary to compare different versions of the same content, in this work we study how cross-content comparisons may benefit psychometric scaling. For this purpose, we use two different video quality databases which have both rating and PWC experiment results. The results show that despite same-content comparisons play a major role in the accuracy of psychometric scaling, the use of a small portion of cross-content comparison pairs is indeed beneficial to obtain more accurate quality estimates.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Pairwise comparisons (PWC) methodology is one of the most commonly used methods for subjective quality assessment, especially for computer graphics and multimedia applications. Unlike rating methods, a psychometric scaling operation is required to convert PWC results to numerical subjective quality values. Due to the nature of this scaling operation, the obtained quality scores are relative to the set they are computed in. While it is customary to compare different versions of the same content, in this work we study how cross-content comparisons may benefit psychometric scaling. For this purpose, we use two different video quality databases which have both rating and PWC experiment results. The results show that despite same-content comparisons play a major role in the accuracy of psychometric scaling, the use of a small portion of cross-content comparison pairs is indeed beneficial to obtain more accurate quality estimates. |
Croci, Simone; Ozcinar, Cagri; Zerman, Emin; Cabrera, Julian; Smolic, Aljosa Voronoi-based Objective Quality Metrics for Omnidirectional Video Inproceedings In: 11th International Conference on Quality of Multimedia Experience (QoMEX 2019), 2019. @inproceedings{Croci2019,
title = {Voronoi-based Objective Quality Metrics for Omnidirectional Video},
author = {Simone Croci and Cagri Ozcinar and Emin Zerman and Julian Cabrera and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/voronoi-based-objective-metrics/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/03/QoMEX2019.pdf},
year = {2019},
date = {2019-06-06},
booktitle = {11th International Conference on Quality of Multimedia Experience (QoMEX 2019)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Sebastian Knorr Simone Croci, Aljosa Smolic Study on the Perception of Sharpness Mismatch in Stereoscopic Video Inproceedings In: 11th International Conference on Quality of Multimedia Experience (QoMEX 2019), 2019. @inproceedings{Croci2019b,
title = {Study on the Perception of Sharpness Mismatch in Stereoscopic Video},
author = {Simone Croci, Sebastian Knorr, Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/05/QoMEX2019_Sharpness_Mismatch.pdf
https://v-sense.scss.tcd.ie:443/research/study-on-the-perception-of-sharpness-mismatch-in-stereoscopic-video/},
year = {2019},
date = {2019-06-04},
booktitle = {11th International Conference on Quality of Multimedia Experience (QoMEX 2019)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Hudon, Matis; Grogan, Mairéad; Pagés, Rafael; Ondřej, Jan; Smolić, Aljoša 2DToonShade: A stroke based toon shading system Journal Article In: Computers & Graphics: X, vol. 1, pp. 100003, 2019, ISBN: 2590-1486. @article{HUDON2019100003,
title = {2DToonShade: A stroke based toon shading system},
author = {Matis Hudon and Mairéad Grogan and Rafael Pagés and Jan Ondřej and Aljoša Smolić},
url = {https://www.sciencedirect.com/science/article/pii/S2590148619300032
https://youtu.be/gmxfUw3BvDo
},
doi = {https://doi.org/10.1016/j.cagx.2019.100003},
isbn = {2590-1486},
year = {2019},
date = {2019-06-01},
journal = {Computers & Graphics: X},
volume = {1},
pages = {100003},
abstract = {We present 2DToonShade: a semi-automatic method for creating shades and self-shadows in cel animation. Besides producing attractive images, shades and shadows provide important visual cues about depth, shapes, movement and lighting of the scene. In conventional cel animation, shades and shadows are drawn by hand. As opposed to previous approaches, this method does not rely on a complex 3D reconstruction of the scene: its key advantages are simplicity and ease of use. The tool was designed to stay as close as possible to the natural 2D creative environment and therefore provides an intuitive and user-friendly interface. Our system creates shading based on hand-drawn objects or characters, given very limited guidance from the user. The method employs simple yet very efficient algorithms to create shading directly out of drawn strokes. We evaluate our system through a subjective user study and provide qualitative comparison of our method versus existing professional tools and recent state of the art.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We present 2DToonShade: a semi-automatic method for creating shades and self-shadows in cel animation. Besides producing attractive images, shades and shadows provide important visual cues about depth, shapes, movement and lighting of the scene. In conventional cel animation, shades and shadows are drawn by hand. As opposed to previous approaches, this method does not rely on a complex 3D reconstruction of the scene: its key advantages are simplicity and ease of use. The tool was designed to stay as close as possible to the natural 2D creative environment and therefore provides an intuitive and user-friendly interface. Our system creates shading based on hand-drawn objects or characters, given very limited guidance from the user. The method employs simple yet very efficient algorithms to create shading directly out of drawn strokes. We evaluate our system through a subjective user study and provide qualitative comparison of our method versus existing professional tools and recent state of the art. |
Le Pendu, Mikael; Guillemot, Christine; Smolic, Aljosa A Fourier Disparity Layer representation for Light Fields Journal Article In: IEEE Transactions on Image Processing, 2019. @article{Pendu2019,
title = {A Fourier Disparity Layer representation for Light Fields},
author = {Le Pendu, Mikael and Christine Guillemot and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/05/FDL_preprint.pdf},
year = {2019},
date = {2019-05-24},
journal = {IEEE Transactions on Image Processing},
abstract = {In this paper, we present a new Light Field representation for efficient Light Field processing and rendering called Fourier Disparity Layers (FDL).
The proposed FDL representation samples the Light Field in the depth (or equivalently the disparity) dimension by decomposing the scene as a discrete sum of layers. The layers can be constructed from various types of Light Field inputs including a set of sub-aperture images, a focal stack, or even a combination of both. From our derivations in the Fourier domain, the layers are simply obtained by a regularized least square regression performed independently at each spatial frequency, which is efficiently parallelized in a GPU implementation. Our model is also used to derive a gradient descent based calibration step that estimates the input view positions and an optimal set of disparity values required for the layer construction. Once the layers are known, they can be simply shifted and filtered to produce different viewpoints of the scene while controlling the focus and simulating a camera aperture of arbitrary shape and size. Our implementation in the Fourier domain allows real time Light Field rendering. Finally, direct applications such as view interpolation or extrapolation and denoising are presented and evaluated.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
In this paper, we present a new Light Field representation for efficient Light Field processing and rendering called Fourier Disparity Layers (FDL).
The proposed FDL representation samples the Light Field in the depth (or equivalently the disparity) dimension by decomposing the scene as a discrete sum of layers. The layers can be constructed from various types of Light Field inputs including a set of sub-aperture images, a focal stack, or even a combination of both. From our derivations in the Fourier domain, the layers are simply obtained by a regularized least square regression performed independently at each spatial frequency, which is efficiently parallelized in a GPU implementation. Our model is also used to derive a gradient descent based calibration step that estimates the input view positions and an optimal set of disparity values required for the layer construction. Once the layers are known, they can be simply shifted and filtered to produce different viewpoints of the scene while controlling the focus and simulating a camera aperture of arbitrary shape and size. Our implementation in the Fourier domain allows real time Light Field rendering. Finally, direct applications such as view interpolation or extrapolation and denoising are presented and evaluated. |
Ozcinar, Cagri; Cabrera, Julian; Smolic, Aljosa Visual Attention-Aware Omnidirectional Video Streaming Using Optimal Tiles for Virtual Reality Journal Article In: IEEE Journal on Emerging and Selected Topics in Circuits and Systems , 2019. @article{Ozcinar2019,
title = {Visual Attention-Aware Omnidirectional Video Streaming Using Optimal Tiles for Virtual Reality},
author = {Cagri Ozcinar and Julian Cabrera and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/va-aware-odv-streaming/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/03/JETCAS_SI_immersive_2018_pc.pdf},
year = {2019},
date = {2019-05-15},
journal = {IEEE Journal on Emerging and Selected Topics in Circuits and Systems },
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
Rana, Aakanksha; Ozcinar, Cagri; Smolic, Aljosa Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality Inproceedings In: 44th International Conference on Acoustics, Speech, and Signal Processing, (ICASSP), 2019. @inproceedings{Rana2019,
title = {Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality},
author = {Aakanksha Rana and Cagri Ozcinar and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/02/ICASSP2019_multimodal.pdf},
year = {2019},
date = {2019-05-12},
booktitle = { 44th International Conference on Acoustics, Speech, and Signal Processing, (ICASSP)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Grogan, Mairead; Dahyot., Rozenn L2 Divergence for robust colour transfer Journal Article In: Computer Vision and Image Understanding, vol. 181, pp. 39-49, 2019. @article{Grogan2019,
title = {L2 Divergence for robust colour transfer},
author = {Mairead Grogan and Rozenn Dahyot.},
url = {https://v-sense.scss.tcd.ie:443/research/vfx-animation/l2-divergence-for-robust-colour-transfer/},
doi = {10.1016/j.cviu.2019.02.002},
year = {2019},
date = {2019-04-01},
journal = {Computer Vision and Image Understanding},
volume = {181},
pages = {39-49},
abstract = {Optimal Transport is a very popular framework for performing colour transfer in images and videos. We have proposed an alternative framework where the cost function used for inferring a parametric transfer function is defined as the robust L2 divergence between two probability density functions. In this paper, we show that our approach combines many advantages of state of the art techniques and outperforms many recent algorithms as measured quantitatively with standard quality metrics, and qualitatively using perceptual studies. Mathematically, our formulation is presented in contrast to the Optimal Transport cost function that shares similarities with our cost function. Our formulation, however, is more flexible as it allows colour correspondences that may be available to be taken into account and performs well despite potential occurrences of correspondence outlier pairs. Our algorithm is shown to be fast, robust and it easily allows for user interaction providing freedom for artists to fine tune the recoloured images and videos.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Optimal Transport is a very popular framework for performing colour transfer in images and videos. We have proposed an alternative framework where the cost function used for inferring a parametric transfer function is defined as the robust L2 divergence between two probability density functions. In this paper, we show that our approach combines many advantages of state of the art techniques and outperforms many recent algorithms as measured quantitatively with standard quality metrics, and qualitatively using perceptual studies. Mathematically, our formulation is presented in contrast to the Optimal Transport cost function that shares similarities with our cost function. Our formulation, however, is more flexible as it allows colour correspondences that may be available to be taken into account and performs well despite potential occurrences of correspondence outlier pairs. Our algorithm is shown to be fast, robust and it easily allows for user interaction providing freedom for artists to fine tune the recoloured images and videos. |
Dudek, Roman; Croci, Simone; Smolic, Aljosa; Knorr, Sebastian Robust Global and Local Color Matching in Stereoscopic Omnidirectional Content Journal Article In: Elsevir Signal Processing: Image Communication, vol. 74, pp. 231-241, 2019. @article{Roman2019,
title = {Robust Global and Local Color Matching in Stereoscopic Omnidirectional Content},
author = {Roman Dudek and Simone Croci and Aljosa Smolic and Sebastian Knorr},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/05/S3D_360Video_ColorMatching.pdf
https://v-sense.scss.tcd.ie:443/research/robust-global-and-local-color-matching-in-stereoscopic-omnidirectional-content/},
year = {2019},
date = {2019-03-13},
journal = {Elsevir Signal Processing: Image Communication},
volume = {74},
pages = {231-241},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
Zheng, Xu; Chalasani, Tejo; Ghosal, Koustav; Lutz, Sebastian; Smolic., Aljosa STaDA: Style Transfer as Data Augmentation Conference 14th International Conference on Computer Vision Theory and Applications, 2019. @conference{Zheng2019,
title = {STaDA: Style Transfer as Data Augmentation},
author = {Xu Zheng and Tejo Chalasani and Koustav Ghosal and Sebastian Lutz and Aljosa Smolic. },
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/05/STaDA-StyleTransferasDataAugmentation-1.pdf},
doi = {10.5220/0007353401070114},
year = {2019},
date = {2019-02-27},
booktitle = {14th International Conference on Computer Vision Theory and Applications},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
da Silva, Rogerio E.; Ondrej, Jan; Smolic, Aljosa Using LSTM for Automatic Classification of Human Motion Capture Data Conference 14th International Conference on Computer Graphics Theory and Applications 2019. @conference{daSilva2019,
title = {Using LSTM for Automatic Classification of Human Motion Capture Data},
author = {Rogerio E. da Silva and Jan Ondrej and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/grapp_2019_15_cr-3/},
doi = {10.5220/0007349902360243},
year = {2019},
date = {2019-02-25},
organization = {14th International Conference on Computer Graphics Theory and Applications},
abstract = {Creative studios tend to produce an overwhelming amount of content everyday and being able to manage these
data and reuse it in new productions represent a way for reducing costs and increasing productivity and profit.
This work is part of a project aiming to develop reusable assets in creative productions. This paper describes
our first attempt using deep learning to classify human motion from motion capture files. It relies on a long
short-term memory network (LSTM) trained to recognize action on a simplified ontology of basic actions like
walking, running or jumping. Our solution was able of recognizing several actions with an accuracy over 95%
in the best cases.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Creative studios tend to produce an overwhelming amount of content everyday and being able to manage these
data and reuse it in new productions represent a way for reducing costs and increasing productivity and profit.
This work is part of a project aiming to develop reusable assets in creative productions. This paper describes
our first attempt using deep learning to classify human motion from motion capture files. It relies on a long
short-term memory network (LSTM) trained to recognize action on a simplified ontology of basic actions like
walking, running or jumping. Our solution was able of recognizing several actions with an accuracy over 95%
in the best cases. |
Moynihan, Matthew; Pagés, Rafael; Smolic, Aljosa Spatio-Temporal Upsampling for Free Viewpoint Video Point Clouds Conference In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications VISAPP, vol. 5, SciTePress, 2019, ISBN: 978-989-758-354-4. @conference{Moynihan2019,
title = {Spatio-Temporal Upsampling for Free Viewpoint Video Point Clouds},
author = {Matthew Moynihan and Rafael Pagés and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/visigrapp_2019camera-ready/},
doi = {10.5220/0007361606840692 },
isbn = {978-989-758-354-4},
year = {2019},
date = {2019-02-25},
booktitle = {In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications VISAPP},
volume = {5},
pages = {684-692},
publisher = {SciTePress},
abstract = {This paper presents an approach to upsampling point cloud sequences captured through a wide baseline camera setup in a spatio-temporally consistent manner. The system uses edge-aware scene flow to understand the movement of 3D points across a free-viewpoint video scene to impose temporal consistency. In addition to geometric upsampling, a Hausdorff distance quality metric is used to filter noise and further improve the density of each point cloud. Results show that the system produces temporally consistent point clouds, not only reducing errors and noise but also recovering details that were lost in frame-by-frame dense point cloud reconstruction. The system has been successfully tested in sequences that have been captured via both static or handheld cameras.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
This paper presents an approach to upsampling point cloud sequences captured through a wide baseline camera setup in a spatio-temporally consistent manner. The system uses edge-aware scene flow to understand the movement of 3D points across a free-viewpoint video scene to impose temporal consistency. In addition to geometric upsampling, a Hausdorff distance quality metric is used to filter noise and further improve the density of each point cloud. Results show that the system produces temporally consistent point clouds, not only reducing errors and noise but also recovering details that were lost in frame-by-frame dense point cloud reconstruction. The system has been successfully tested in sequences that have been captured via both static or handheld cameras. |
O’Dwyer, Néill; Johnson, Nicholas Exploring volumetric video and narrative through Samuel Beckett’s Play Journal Article In: International Journal of Performance Arts and Digital Media, 2019, ISSN: 1479-4713. @article{O’Dwyer2019,
title = {Exploring volumetric video and narrative through Samuel Beckett’s Play},
author = {Néill O’Dwyer and Nicholas Johnson},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/01/Exploring-volumetric-video-and-narrative-through-Samuel-Beckett’s-Play-14794713.2019.pdf},
doi = {10.1080/14794713.2019.1567243},
issn = {1479-4713},
year = {2019},
date = {2019-01-15},
journal = {International Journal of Performance Arts and Digital Media},
abstract = {This paper draws upon the primary research of an interdepartmental collaborative practice-as-research project that took place at Trinity College during 2017, in which a Samuel Beckett play, entitled Play, was reinterpreted for virtual reality. It included contributions from the Departments of Computer Science, Drama and Electrical and Electronic Engineering. The goal of this article is to offer some expanded philosophical and aesthetic reflections on the practice, now that the major production processes are completed. The primary themes that are dealt with in this paper are the reorganised rules concerning: (1) making work in the VR medium and (2) the impact of the research on viewership and content engagement in digital culture. In doing so we draw on the technological philosophy of Bernard Stiegler, who extends the legacy of Gilles Deleuze and Gilbert Simondon, to reflect on the psychic, sociopolitical and economic impacts of VR technology on cognition, subjectivity and identity in the contemporary digitalised world.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
This paper draws upon the primary research of an interdepartmental collaborative practice-as-research project that took place at Trinity College during 2017, in which a Samuel Beckett play, entitled Play, was reinterpreted for virtual reality. It included contributions from the Departments of Computer Science, Drama and Electrical and Electronic Engineering. The goal of this article is to offer some expanded philosophical and aesthetic reflections on the practice, now that the major production processes are completed. The primary themes that are dealt with in this paper are the reorganised rules concerning: (1) making work in the VR medium and (2) the impact of the research on viewership and content engagement in digital culture. In doing so we draw on the technological philosophy of Bernard Stiegler, who extends the legacy of Gilles Deleuze and Gilbert Simondon, to reflect on the psychic, sociopolitical and economic impacts of VR technology on cognition, subjectivity and identity in the contemporary digitalised world. |
Zerman, Emin; Gao, Pan; Ozcinar, Cagri; Smolic, Aljosa Subjective and Objective Quality Assessment for Volumetric Video Compression Inproceedings In: IS&T Electronic Imaging, Image Quality and System Performance XVI, 2019. @inproceedings{zerman2019subjective,
title = {Subjective and Objective Quality Assessment for Volumetric Video Compression},
author = {Emin Zerman and Pan Gao and Cagri Ozcinar and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/6dof/quality-assessment-for-fvv-compression/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/11/iqsp2019_preprint.pdf},
doi = {10.2352/ISSN.2470-1173.2019.10.IQSP-323},
year = {2019},
date = {2019-01-14},
booktitle = {IS&T Electronic Imaging, Image Quality and System Performance XVI},
abstract = {Volumetric video is becoming easier to capture and display with the recent technical developments in the acquisition, and display technologies. Using point clouds is a popular way to represent volumetric video for augmented or virtual reality applications. This representation, however, requires a large number of points to achieve a high quality of experience and needs compression before storage and transmission. In this paper, we study the subjective and objective quality assessment results for volumetric video compression, using a state-of-the-art compression algorithm: MPEG Point Cloud Compression Test Model Category 2 (TMC2). We conduct subjective experiments to find the perceptual impacts on compressed volumetric video with different quantization parameters and point counts. Additionally, we find the relationship between the state-of-the-art objective quality metrics and the acquired subjective quality assessment results. To the best of our knowledge, this study is the first to consider TMC2 compression for volumetric video represented as coloured point clouds and study its effects on the perceived quality. The results show that the effect of input point counts for TMC2 compression is not meaningful, and some geometry distortion metrics disagree with the perceived quality. The developed database is publicly available to promote the study of volumetric video compression.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Volumetric video is becoming easier to capture and display with the recent technical developments in the acquisition, and display technologies. Using point clouds is a popular way to represent volumetric video for augmented or virtual reality applications. This representation, however, requires a large number of points to achieve a high quality of experience and needs compression before storage and transmission. In this paper, we study the subjective and objective quality assessment results for volumetric video compression, using a state-of-the-art compression algorithm: MPEG Point Cloud Compression Test Model Category 2 (TMC2). We conduct subjective experiments to find the perceptual impacts on compressed volumetric video with different quantization parameters and point counts. Additionally, we find the relationship between the state-of-the-art objective quality metrics and the acquired subjective quality assessment results. To the best of our knowledge, this study is the first to consider TMC2 compression for volumetric video represented as coloured point clouds and study its effects on the perceived quality. The results show that the effect of input point counts for TMC2 compression is not meaningful, and some geometry distortion metrics disagree with the perceived quality. The developed database is publicly available to promote the study of volumetric video compression. |
2018
|
O’Dwyer, Néill; Johnson, Nicholas; Bates, Enda; Pagés, Rafael; Ondřej, Jan; Amplianitis, Konstantinos; Monaghan, David; Smolic, Aljoša Samuel Beckett in Virtual Reality: Exploring narrative using free viewpoint video Journal Article In: The MIT Press Journals - Leonardo, pp. 10, 2018. @article{O’Dwyer2018,
title = {Samuel Beckett in Virtual Reality: Exploring narrative using free viewpoint video},
author = {Néill O’Dwyer and Nicholas Johnson and Enda Bates and Rafael Pagés and Jan Ondřej and Konstantinos Amplianitis and David Monaghan and Aljoša Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/01/leon_a_017211-3.pdf
https://www.mitpressjournals.org/doi/abs/10.1162/leon_a_01721},
doi = {10.1162/leon_a_01721},
year = {2018},
date = {2018-12-26},
journal = { The MIT Press Journals - Leonardo},
pages = {10},
abstract = {Building on a poster presentation at Siggraph 2018 [1], this article describes an investigation of interactive narrative in virtual reality (VR) through Samuel Beckett’s theatrical text Play. Actors are captured in a green screen environment using free-viewpoint video (FVV). Built in a game engine, the scene is complete with binaural spatial audio and six degrees of freedom of movement. The project explores how ludic qualities in the original text elicit the conversational and interactive specificities of the digital medium. The work affirms potential for interactive narrative in VR, opens new experiences of the text, and highlights the reorganisation of the author–audience dynamic.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Building on a poster presentation at Siggraph 2018 [1], this article describes an investigation of interactive narrative in virtual reality (VR) through Samuel Beckett’s theatrical text Play. Actors are captured in a green screen environment using free-viewpoint video (FVV). Built in a game engine, the scene is complete with binaural spatial audio and six degrees of freedom of movement. The project explores how ludic qualities in the original text elicit the conversational and interactive specificities of the digital medium. The work affirms potential for interactive narrative in VR, opens new experiences of the text, and highlights the reorganisation of the author–audience dynamic. |
Knorr, Sebastian; Ozcinar, Cagri; Fearghail, Colm O; Smolic, Aljosa Director's Cut - A Combined Dataset for Visual Attention Analysis in Cinematic VR Content Inproceedings In: The 15th ACM SIGGRAPH European Conference on Visual Media Production, 2018. @inproceedings{Knorr2018,
title = {Director's Cut - A Combined Dataset for Visual Attention Analysis in Cinematic VR Content},
author = {Sebastian Knorr and Cagri Ozcinar and Colm O Fearghail and Aljosa Smolic },
url = {https://v-sense.scss.tcd.ie:443/research/3dof/directors-cut-research/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/09/CVMP2018_DirectorsCut_public-1.pdf
},
doi = {10.1145/3278471.3278472 },
year = {2018},
date = {2018-12-13},
booktitle = {The 15th ACM SIGGRAPH European Conference on Visual Media Production},
abstract = {Methods of storytelling in cinema have well established conventions that have been built over the course of its history and the development of the format. In 360° film many of the techniques that have formed part of this cinematic language or visual narrative are not easily applied or are not applicable due to the nature of the format i.e. not contained the border of the screen. In this paper, we analyze how end-users view 360° video in the presence of directional cues and evaluate if they are able to follow the actual story of narrative 360° films. We first let filmmakers create an intended scan-path, the so called director's cut, by setting position markers in the equirectangular representation of the omnidirectional content for eight short 360° films. Alongside this the filmmakers provided additional information regarding directional cues and plot points. Then, we performed a subjective test with 20 participants watching the films with a head-mounted display and recorded the center position of the viewports. The resulting scan-paths of the participants are then compared against the director's cut using different scan-path similarity measures. In order to better visualize the similarity between the scan-paths, we introduce a new metric which measures and visualizes the viewport overlap between the participants' scan-paths and the director's cut. Finally, the entire dataset, i.e. the director's cuts including the directional cues and plot points as well as the scan-paths of the test subjects, is publicly available with this paper.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Methods of storytelling in cinema have well established conventions that have been built over the course of its history and the development of the format. In 360° film many of the techniques that have formed part of this cinematic language or visual narrative are not easily applied or are not applicable due to the nature of the format i.e. not contained the border of the screen. In this paper, we analyze how end-users view 360° video in the presence of directional cues and evaluate if they are able to follow the actual story of narrative 360° films. We first let filmmakers create an intended scan-path, the so called director's cut, by setting position markers in the equirectangular representation of the omnidirectional content for eight short 360° films. Alongside this the filmmakers provided additional information regarding directional cues and plot points. Then, we performed a subjective test with 20 participants watching the films with a head-mounted display and recorded the center position of the viewports. The resulting scan-paths of the participants are then compared against the director's cut using different scan-path similarity measures. In order to better visualize the similarity between the scan-paths, we introduce a new metric which measures and visualizes the viewport overlap between the participants' scan-paths and the director's cut. Finally, the entire dataset, i.e. the director's cuts including the directional cues and plot points as well as the scan-paths of the test subjects, is publicly available with this paper. |
Ozcinar, Cagri; Cabrera, Julian; Smolic, Aljosa Omnidirectional Video Streaming Using Visual Attention-Driven Dynamic Tiling for VR Inproceedings In: IEEE International Conference on Visual Communications and Image Processing (VCIP) 2018, Taichung, Taiwan, 2018. @inproceedings{Ozcinar2018b,
title = {Omnidirectional Video Streaming Using Visual Attention-Driven Dynamic Tiling for VR},
author = {Cagri Ozcinar and Julian Cabrera and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/03/VCIP_2018.pdf
https://v-sense.scss.tcd.ie:443/research/3dof/360-degree-video-coding-and-streaming-for-virtual-reality/},
year = {2018},
date = {2018-12-09},
booktitle = {IEEE International Conference on Visual Communications and Image Processing (VCIP) 2018},
address = {Taichung, Taiwan},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Fearghail, Colm O; Ozcinar, Cagri; Knorr, Sebastian; Smolic, Aljosa Director's Cut - Analysis of Aspects of Interactive Storytelling for VR Films Inproceedings In: International Conference for Interactive Digital Storytelling (ICIDS) 2018
, Dublin, Ireland, 2018, (Received the runner up best full paper award). @inproceedings{Fearghail2018,
title = {Director's Cut - Analysis of Aspects of Interactive Storytelling for VR Films},
author = {Colm O Fearghail and Cagri Ozcinar and Sebastian Knorr and Aljosa Smolic },
url = {https://v-sense.scss.tcd.ie:443/research/3dof/directors-cut-analysis-of-aspects-of-interactive-storytelling-for-vr-films/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/12/storyTelling.pdf},
year = {2018},
date = {2018-12-05},
booktitle = {International Conference for Interactive Digital Storytelling (ICIDS) 2018
},
address = {Dublin, Ireland},
abstract = {To explore methods that are currently used by professional virtual reality (VR) filmlmmakers to tell their stories and guide users, we analyze how end-users view 360° video in the presence of directional cues and evaluate if they are able to follow the actual story of narrative 360° films. In this context, we first collected data from five professional VR filmmakers. The data contains eight 360° videos, the directors cut, which is the intended viewing direction of the director, plot points and directional cues used for user guidance. Then, we performed a subjective experiment with 20 test subjects viewing the videos while their head orientation was recorded. Finally, we present and discuss the experimental results and show, among others, that visual discomfort and disorientation on part of the viewer not only lessen the immersive quality of the films but also cause difficulties in the viewer gaining a full understanding of the narrative that the director wished them to view.},
note = {Received the runner up best full paper award},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
To explore methods that are currently used by professional virtual reality (VR) filmlmmakers to tell their stories and guide users, we analyze how end-users view 360° video in the presence of directional cues and evaluate if they are able to follow the actual story of narrative 360° films. In this context, we first collected data from five professional VR filmmakers. The data contains eight 360° videos, the directors cut, which is the intended viewing direction of the director, plot points and directional cues used for user guidance. Then, we performed a subjective experiment with 20 test subjects viewing the videos while their head orientation was recorded. Finally, we present and discuss the experimental results and show, among others, that visual discomfort and disorientation on part of the viewer not only lessen the immersive quality of the films but also cause difficulties in the viewer gaining a full understanding of the narrative that the director wished them to view. |
Dowling, Declan; Fearghail, Colm O; Smolic, Aljosa; Knorr, Sebastian Faoladh : A Case Study in Cinematic VR Storytelling and Production Inproceedings In: International Conference for Interactive Digital Storytelling, 2018. @inproceedings{dowling2018,
title = {Faoladh : A Case Study in Cinematic VR Storytelling and Production},
author = {Declan Dowling and Colm O Fearghail and Aljosa Smolic and Sebastian Knorr},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/10/Faoladh_A_Case_Study_in_Cinematic_VR_Storytelling_and_Production-1.pdf},
year = {2018},
date = {2018-12-05},
booktitle = {International Conference for Interactive Digital Storytelling},
abstract = {Portraying traditional cinematic narratives in virtual reality (VR) is an emerging practice where often the methods normally associated with cinematic storytelling need to be adapted to the 360° format. In this paper we investigates some proposed cinematic practices for narrative storytelling in a cinematic VR film set in late 9th century Ireland that follows the perilous journey young Celt as he evades being captured by Viking raiders. From this we will analyze the fidelity of those practices with results collected from YouTube Analytics.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Portraying traditional cinematic narratives in virtual reality (VR) is an emerging practice where often the methods normally associated with cinematic storytelling need to be adapted to the 360° format. In this paper we investigates some proposed cinematic practices for narrative storytelling in a cinematic VR film set in late 9th century Ireland that follows the perilous journey young Celt as he evades being captured by Viking raiders. From this we will analyze the fidelity of those practices with results collected from YouTube Analytics. |
Fearghail, Colm O; Ozcinar, Cagri; Knorr, Sebastian; Smolic, Aljosa Director's Cut - Analysis of VR Film Cuts for Interactive Storytelling Inproceedings In: International Conference on 3D Immersion (IC3D) 2018, Brussels, Belgium, 2018. @inproceedings{Fearghail2018b,
title = {Director's Cut - Analysis of VR Film Cuts for Interactive Storytelling},
author = {Colm O Fearghail and Cagri Ozcinar and Sebastian Knorr and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/3dof/directors-cut-analysis-of-vr-film-cuts-for-interactive-storytelling/
https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/12/2018_IC3D_DirectorCut_AttentionStoryTelling.pdf},
year = {2018},
date = {2018-12-05},
booktitle = {International Conference on 3D Immersion (IC3D) 2018},
address = {Brussels, Belgium},
abstract = {The usage of film cuts, or transitions, is a powerful technique in interactive storytelling to express the film story by leading the viewer’s attention. To explore how existing transition techniques are currently being used by professional 360-degree filmmakers, this paper investigates the impact of transitions and additional graphical elements from a storytelling perspective. We base this on the recently published Director's Cut dataset which contains professional 360-degree films with the directors' intending viewing direction (i.e., director's cut) and test subjects' scan-paths. Our objective is to examine widely used transition techniques in professional 360-degree film, and with our finding guide filmmakers in the storytelling and editing process. To the authors’ knowledge, this is the first study to analyze professionally prepared VR film cuts concerning the distance of viewers' scan-paths and director's cut. We observed that the intended viewing direction is required for the viewer to understand the story best and not miss the director's plot points. The transition is a point where the viewers are presented with a new scene and are required to orientate themselves within it. Thus if there is a considerable distance mismatch between the intended and actual viewing, the viewer is not in the best position to understand the scene properly. Our results show that the use of simple graphics can serve as a reference for viewers as the transition happens and they are presented with a new immersive environment.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The usage of film cuts, or transitions, is a powerful technique in interactive storytelling to express the film story by leading the viewer’s attention. To explore how existing transition techniques are currently being used by professional 360-degree filmmakers, this paper investigates the impact of transitions and additional graphical elements from a storytelling perspective. We base this on the recently published Director's Cut dataset which contains professional 360-degree films with the directors' intending viewing direction (i.e., director's cut) and test subjects' scan-paths. Our objective is to examine widely used transition techniques in professional 360-degree film, and with our finding guide filmmakers in the storytelling and editing process. To the authors’ knowledge, this is the first study to analyze professionally prepared VR film cuts concerning the distance of viewers' scan-paths and director's cut. We observed that the intended viewing direction is required for the viewer to understand the story best and not miss the director's plot points. The transition is a point where the viewers are presented with a new scene and are required to orientate themselves within it. Thus if there is a considerable distance mismatch between the intended and actual viewing, the viewer is not in the best position to understand the scene properly. Our results show that the use of simple graphics can serve as a reference for viewers as the transition happens and they are presented with a new immersive environment. |
Knorr, Sebastian; Hudon, Matis; Cabrera, Julian; Sikora, Thomas; Smolic, Aljosa DeepStereoBrush: Interactive Depth Map Creation Inproceedings In: International Conference on 3D Immersion, 2018, (Received the Lumiere Award for the best scientific paper). @inproceedings{knorr2018b,
title = {DeepStereoBrush: Interactive Depth Map Creation},
author = {Sebastian Knorr and Matis Hudon and Julian Cabrera and Thomas Sikora and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/10/DeepStereoBrush_Stereopsia-6.pdf},
year = {2018},
date = {2018-12-05},
booktitle = {International Conference on 3D Immersion},
abstract = {In this paper, we introduce a novel interactive depth map creation approach for image sequences which uses depth scribbles as input at user-defined keyframes. These scribbled depth values are then propagated within these keyframes and across the entire sequence using a 3-dimensional geodesic distance transform (3D-GDT). In order to further improve the depth estimation of the intermediate frames, we make use of a con-volutional neural network (CNN) in an unconventional manner. Our process is based on online learning which allows us to specifically train a disposable network for each sequence individually using the user generated depth at keyframes along with corresponding RGB images as training pairs. Thus, we actually take advantage of one of the most common issues in deep learning: over-fitting. Furthermore, we integrated this approach into a professional interactive depth map creation application and compared our results against the state of the art in interactive depth map creation.},
note = {Received the Lumiere Award for the best scientific paper},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper, we introduce a novel interactive depth map creation approach for image sequences which uses depth scribbles as input at user-defined keyframes. These scribbled depth values are then propagated within these keyframes and across the entire sequence using a 3-dimensional geodesic distance transform (3D-GDT). In order to further improve the depth estimation of the intermediate frames, we make use of a con-volutional neural network (CNN) in an unconventional manner. Our process is based on online learning which allows us to specifically train a disposable network for each sequence individually using the user generated depth at keyframes along with corresponding RGB images as training pairs. Thus, we actually take advantage of one of the most common issues in deep learning: over-fitting. Furthermore, we integrated this approach into a professional interactive depth map creation application and compared our results against the state of the art in interactive depth map creation. |
O’Dwyer, Néill; Ondřej, Jan; Pagés, Rafael; Amplianitis, Konstantinos; Smolić, Aljoša Jonathan Swift: Augmented Reality Application for Trinity Library ’s Long Room Conference International Conference on Interactive Digital Storytelling (ICIDS 2018) 2018. @conference{O’Dwyer2018b,
title = {Jonathan Swift: Augmented Reality Application for Trinity Library ’s Long Room},
author = {Néill O’Dwyer and Jan Ondřej and Rafael Pagés and Konstantinos Amplianitis and Aljoša Smolić},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/01/ODwyer2018_Chapter_JonathanSwiftAugmentedRealityA-2.pdf},
doi = {10.1007/978-3-030-04028-4_39},
year = {2018},
date = {2018-12-05},
pages = {pp 348-351},
organization = {International Conference on Interactive Digital Storytelling (ICIDS 2018)},
abstract = {This demo paper describes a project that engages cutting-edge free viewpoint video (FVV) techniques for developing content for an augmented reality prototype. The article traces the evolutionary process from concept, through narrative development, to completed AR prototypes for the HoloLens and handheld mobile devices. It concludes with some reflections on the affordances of the various hardware formats and posits future directions for the research.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
This demo paper describes a project that engages cutting-edge free viewpoint video (FVV) techniques for developing content for an augmented reality prototype. The article traces the evolutionary process from concept, through narrative development, to completed AR prototypes for the HoloLens and handheld mobile devices. It concludes with some reflections on the affordances of the various hardware formats and posits future directions for the research. |
Ozcinar, Cagri; Cabrera, Julian; Smolic, Aljosa Viewport-aware omnidirectional video streaming using visual attention and dynamic tiles Inproceedings In: 7-th European Workshop on Visual Information Processing (EUVIP) 2018, 2018. @inproceedings{Ozcinar2018c,
title = {Viewport-aware omnidirectional video streaming using visual attention and dynamic tiles},
author = {Cagri Ozcinar and Julian Cabrera and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2019/03/EUVIP_2018_co.pdf
https://v-sense.scss.tcd.ie:443/?p=519},
year = {2018},
date = {2018-11-27},
booktitle = {7-th European Workshop on Visual Information Processing (EUVIP) 2018},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Chalasani, Tejo; Ondrej, Jan; Smolic, Aljosa Egocentric Gesture Recognition for Head-Mounted AR devices Conference Forthcoming Adjunct Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Forthcoming. @conference{Tejo_2018,
title = {Egocentric Gesture Recognition for Head-Mounted AR devices},
author = {Tejo Chalasani and Jan Ondrej and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/6dof/egocentric-gesture-recognition-for-head-mounted-ar-devices/
https://arxiv.org/abs/1808.05380},
year = {2018},
date = {2018-10-16},
booktitle = {Adjunct Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR)},
abstract = {Natural interaction with virtual objects in AR/VR environments makes for a smooth user experience. Gestures are a natural extension from real world to augmented space to achieve these interactions. Finding discriminating spatio-temporal features relevant to gestures and hands in ego-view is the primary challenge for recognising egocentric gestures. In this work we propose a data driven end-to-end deep learning approach to address the problem of egocentric gesture recognition, which combines an ego-hand encoder network to find ego-hand features, and a recurrent neural network to discern temporally discriminating features. Since deep learning networks are data intensive, we propose a novel data augmentation technique using green screen capture to alleviate the problem of ground truth annotation. In addition we publish a dataset of 10 gestures performed in a natural fashion in front of a green screen for training and the same 10 gestures performed in different natural scenes without green screen for validation. We also present the results of our network's performance in comparison to the state-of-the-art using the AirGest dataset.},
keywords = {},
pubstate = {forthcoming},
tppubtype = {conference}
}
Natural interaction with virtual objects in AR/VR environments makes for a smooth user experience. Gestures are a natural extension from real world to augmented space to achieve these interactions. Finding discriminating spatio-temporal features relevant to gestures and hands in ego-view is the primary challenge for recognising egocentric gestures. In this work we propose a data driven end-to-end deep learning approach to address the problem of egocentric gesture recognition, which combines an ego-hand encoder network to find ego-hand features, and a recurrent neural network to discern temporally discriminating features. Since deep learning networks are data intensive, we propose a novel data augmentation technique using green screen capture to alleviate the problem of ground truth annotation. In addition we publish a dataset of 10 gestures performed in a natural fashion in front of a green screen for training and the same 10 gestures performed in different natural scenes without green screen for validation. We also present the results of our network's performance in comparison to the state-of-the-art using the AirGest dataset. |
Monroy, Rafael; Hudon, Matis; Smolic, Aljosa Dynamic Environment Mapping for Augmented Reality Applications on Mobile Devices Conference 2018. @conference{Monroy_DynEM2018,
title = {Dynamic Environment Mapping for Augmented Reality Applications on Mobile Devices},
author = {Rafael Monroy and Matis Hudon and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/research/6dof/dynamic-environment-mapping-for-augmented-reality-applications-on-mobile-devices/
https://arxiv.org/abs/1809.08134},
year = {2018},
date = {2018-10-10},
abstract = {Augmented Reality is a topic of foremost interest nowadays. Its main goal is to seamlessly blend virtual content in real-world scenes. Due to the lack of computational power in mobile devices, rendering a virtual object with high-quality, coherent appearance and in real-time, remains an area of active research. In this work, we present a novel pipeline that allows for coupled environment acquisition and virtual object rendering on a mobile device equipped with a depth sensor. While keeping human interaction to a minimum, our system can scan a real scene and project it onto a two-dimensional environment map containing RGB+Depth data. Furthermore, we define a set of criteria that allows for an adaptive update of the environment map to account for dynamic changes in the scene. Then, under the assumption of diffuse surfaces and distant illumination, our method exploits an analytic expression for the irradiance in terms of spherical harmonic coefficients, which leads to a very efficient rendering algorithm. We show that all the processes in our pipeline can be executed while maintaining an average frame rate of 31Hz on a mobile device. },
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Augmented Reality is a topic of foremost interest nowadays. Its main goal is to seamlessly blend virtual content in real-world scenes. Due to the lack of computational power in mobile devices, rendering a virtual object with high-quality, coherent appearance and in real-time, remains an area of active research. In this work, we present a novel pipeline that allows for coupled environment acquisition and virtual object rendering on a mobile device equipped with a depth sensor. While keeping human interaction to a minimum, our system can scan a real scene and project it onto a two-dimensional environment map containing RGB+Depth data. Furthermore, we define a set of criteria that allows for an adaptive update of the environment map to account for dynamic changes in the scene. Then, under the assumption of diffuse surfaces and distant illumination, our method exploits an analytic expression for the irradiance in terms of spherical harmonic coefficients, which leads to a very efficient rendering algorithm. We show that all the processes in our pipeline can be executed while maintaining an average frame rate of 31Hz on a mobile device. |
Ganter, David; Alain, Martin; Hardman, David; Smolic, Aljoscha; Manzke, Michael Light-Field Volume Rendering on GPU for Streaming Time-Varying Data Conference Pacific Graphics (PG 2018), 2018. @conference{Ganter2018,
title = {Light-Field Volume Rendering on GPU for Streaming Time-Varying Data},
author = {David Ganter and Martin Alain and David Hardman and Aljoscha Smolic and Michael Manzke},
year = {2018},
date = {2018-10-08},
booktitle = {Pacific Graphics (PG 2018)},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Croci, Simone; Knorr, Sebastian; Smolic, Aljosa Sharpness Mismatch Detection in Stereoscopic Content with 360-Degree Capability Inproceedings In: IEEE International Conference on Image Processing (ICIP 2018), 2018. @inproceedings{croci2018a,
title = {Sharpness Mismatch Detection in Stereoscopic Content with 360-Degree Capability},
author = {Simone Croci and Sebastian Knorr and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/08/ICIP_2018.pdf},
year = {2018},
date = {2018-10-07},
booktitle = {IEEE International Conference on Image Processing (ICIP 2018)},
abstract = {This paper presents a novel sharpness mismatch detection method for stereoscopic images based on the comparison of edge width histograms of the left and right view. The new method is evaluated on the LIVE 3D Phase II and Ningbo 3D Phase I datasets and compared with two state-of-the-art methods. Experimental results show that the new method highly correlates with user scores of subjective tests and that it outperforms the current state-of-the-art. We then extend the method to stereoscopic omnidirectional images by partitioning the images into patches using a spherical Voronoi diagram. Furthermore, we integrate visual attention data into the detection process in order to weight sharpness mismatch according to the likelihood of its appearance in the viewport of the end-user's virtual reality device. For obtaining visual attention data, we performed a subjective experiment with 17 test subjects and 96 stereoscopic omnidirectional images. The entire dataset including the viewport trajectory data and resulting visual attention maps are publicly available with this paper.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents a novel sharpness mismatch detection method for stereoscopic images based on the comparison of edge width histograms of the left and right view. The new method is evaluated on the LIVE 3D Phase II and Ningbo 3D Phase I datasets and compared with two state-of-the-art methods. Experimental results show that the new method highly correlates with user scores of subjective tests and that it outperforms the current state-of-the-art. We then extend the method to stereoscopic omnidirectional images by partitioning the images into patches using a spherical Voronoi diagram. Furthermore, we integrate visual attention data into the detection process in order to weight sharpness mismatch according to the likelihood of its appearance in the viewport of the end-user's virtual reality device. For obtaining visual attention data, we performed a subjective experiment with 17 test subjects and 96 stereoscopic omnidirectional images. The entire dataset including the viewport trajectory data and resulting visual attention maps are publicly available with this paper. |
Alain, Martin; Smolic, Aljosa Light Field Super-Resolution via LFBM5D Sparse Coding Conference IEEE International Conference on Image Processing (ICIP 2018), 2018. @conference{AlainICIP2018,
title = {Light Field Super-Resolution via LFBM5D Sparse Coding},
author = {Martin Alain and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/05/LFBM5D_SR.pdf},
year = {2018},
date = {2018-10-07},
booktitle = {IEEE International Conference on Image Processing (ICIP 2018)},
abstract = {In this paper, we propose a spatial super-resolution method for light fields, which combines the SR-BM3D single image super-resolution filter and the recently introduced LFBM5D light field denoising filter.
The proposed algorithm iteratively alternates between an LFBM5D filtering step and a back-projection step.
The LFBM5D filter creates disparity compensated 4D patches which are then stacked together with similar 4D patches along a 5th dimension.
The 5D patches are then filtered in the 5D transform domain to enforce a sparse coding of the high-resolution light field, which is a powerful prior to solve the ill-posed super-resolution problem.
The back-projection step then impose the consistency between the known low-resolution light field and the-high resolution estimate.
We further improve this step by using image guided filtering to remove ringing artifacts.
Results show that significant improvement can be achieved compared to state-of-the-art methods, for both light fields captured with a lenslet camera or a gantry.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In this paper, we propose a spatial super-resolution method for light fields, which combines the SR-BM3D single image super-resolution filter and the recently introduced LFBM5D light field denoising filter.
The proposed algorithm iteratively alternates between an LFBM5D filtering step and a back-projection step.
The LFBM5D filter creates disparity compensated 4D patches which are then stacked together with similar 4D patches along a 5th dimension.
The 5D patches are then filtered in the 5D transform domain to enforce a sparse coding of the high-resolution light field, which is a powerful prior to solve the ill-posed super-resolution problem.
The back-projection step then impose the consistency between the known low-resolution light field and the-high resolution estimate.
We further improve this step by using image guided filtering to remove ringing artifacts.
Results show that significant improvement can be achieved compared to state-of-the-art methods, for both light fields captured with a lenslet camera or a gantry. |
Matysiak, Pierre; Grogan, Mairéad; Le Pendu, Mikaël; Alain, Martin; Smolic, Aljosa A pipeline for lenslet light field quality enhancement Conference IEEE International Conference on Image Processing (ICIP 2018), 2018. @conference{MatysiakICIP2018,
title = {A pipeline for lenslet light field quality enhancement},
author = {Pierre Matysiak and Mairéad Grogan and Le Pendu, Mikaël and Martin Alain and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/05/LFPipeline_ICIP18.pdf},
year = {2018},
date = {2018-10-07},
booktitle = {IEEE International Conference on Image Processing (ICIP 2018)},
abstract = {In recent years, light fields have become a major research topic and their applications span across the entire spectrum of classical image processing. Among the different methods used to capture a light field are the lenslet cameras, such as those developed by Lytro. While these cameras give a lot of freedom to the user, they also create light field views that suffer from a number of artefacts. As a result, it is common to ignore a significant subset of these views when doing high-level light field processing. We propose a pipeline to process light field views, first with an enhanced processing of RAW images to extract subaperture images, then a colour correction process using a recent colour transfer algorithm, and finally a denoising process using a state of the art light field denoising approach. We show that our method improves the light field quality on many levels, by reducing ghosting artefacts and noise, as well as retrieving more accurate and homogeneous colours across the sub-aperture images.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In recent years, light fields have become a major research topic and their applications span across the entire spectrum of classical image processing. Among the different methods used to capture a light field are the lenslet cameras, such as those developed by Lytro. While these cameras give a lot of freedom to the user, they also create light field views that suffer from a number of artefacts. As a result, it is common to ignore a significant subset of these views when doing high-level light field processing. We propose a pipeline to process light field views, first with an enhanced processing of RAW images to extract subaperture images, then a colour correction process using a recent colour transfer algorithm, and finally a denoising process using a state of the art light field denoising approach. We show that our method improves the light field quality on many levels, by reducing ghosting artefacts and noise, as well as retrieving more accurate and homogeneous colours across the sub-aperture images. |
Le Pendu, Mikael; Guillemot, Christine; Smolic, Aljosa High Dynamic Range Light Fields via Weighted Low Rank Approximation Conference IEEE International Conference on Image Processing (ICIP 2018), 2018. @conference{LePenduICIP2018,
title = {High Dynamic Range Light Fields via Weighted Low Rank Approximation},
author = {Le Pendu, Mikael and Christine Guillemot and Aljosa Smolic},
url = {https://v-sense.scss.tcd.ie:443/wp-content/uploads/2018/09/ICIP_HDR_LF.pdf},
year = {2018},
date = {2018-10-07},
booktitle = {IEEE International Conference on Image Processing (ICIP 2018)},
abstract = {In this paper, we propose a method for capturing High Dynamic Range (HDR) light fields with dense viewpoint sampling. Analogously to the traditional HDR acquisition process, several light fields are captured at varying exposures with a plenoptic camera. The RAW data is de-multiplexed to retrieve all light field viewpoints for each exposure and perform a soft detection of saturated pixels. Considering a matrix which concatenates all the vectorized views, we formulate the problem of recovering saturated areas as a Weighted Low Rank Approximation (WLRA) where the weights are defined from the soft saturation detection. We show that our algorithm successfully recovers the parallax in the over-exposed areas while the Truncated Nuclear Norm (TNN) minimization, traditionally used for single view HDR imaging, does not generalize to light fields. Advantages of our weighted approach as well as the simultaneous processing of all the viewpoints are also demonstrated in our experiments.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In this paper, we propose a method for capturing High Dynamic Range (HDR) light fields with dense viewpoint sampling. Analogously to the traditional HDR acquisition process, several light fields are captured at varying exposures with a plenoptic camera. The RAW data is de-multiplexed to retrieve all light field viewpoints for each exposure and perform a soft detection of saturated pixels. Considering a matrix which concatenates all the vectorized views, we formulate the problem of recovering saturated areas as a Weighted Low Rank Approximation (WLRA) where the weights are defined from the soft saturation detection. We show that our algorithm successfully recovers the parallax in the over-exposed areas while the Truncated Nuclear Norm (TNN) minimization, traditionally used for single view HDR imaging, does not generalize to light fields. Advantages of our weighted approach as well as the simultaneous processing of all the viewpoints are also demonstrated in our experiments. |