Thesis and Final Year Project Proposals

Instructions to students:
  1. If you are interested in any topic or have any inquiries relating to the topic, please contact the proposer(s) who created the topic via their email as listed under each topic heading, and please copy the Administrator Gail Weadick.
  2. Once you have agreed the topic with the proposer(s) and made your decision, please register your topic with Professor Smolic and please copy the Administrator Gail Weadick.

Quality Control in 360-Video

Proposed by Sebastian Knorr

360-degree video, also called live-action virtual reality (VR), is one of the latest and most powerful trends in immersive media, with an increasing potential for the next decades. However, capturing 360-degree videos is not an easy task as there are many physical limitations which need to be overcome, especially for capturing and post-processing in S3D. In general, such limitations result in artifacts which cause visual discomfort when watching the content with a HMD. The artifacts or issues can be divided into three categories: binocular rivalry issues, conflicts of depth cues and artifacts which occur in both monocular and stereoscopic 360-degree content production.

Within V-SENSE, we developed a framework for quality control in 360-videos which should be improved by new analysis methods or extended with additional artifact detection and, if possible, correction methods.

  • Disparity- / Optical Flow Estimation in 360° Spherical Representation

  • Detection of Stitching and Blending Artifacts in 360° Video / Detection of Local Pseudo-3D in Stereoscopic 360° Video

Real-time 3D skeleton reconstruction using multi-view sequences

Proposed by Rafael Pagés

This project will investigate the possibilities of real time 3D skeleton estimation using multi-view video sequences. As it is already possible to detect the skeleton of different people in in real time video sequences using deep learning, the key components of the project would be creating the mathematical models and 3D optimisation algorithms needed for the real time 3D triangulation.

3D point cloud motion tracking

Proposed by Rafael Pagés

Image-based 3D reconstruction is a very well studied field, however, there is still a lot of room for improvement regarding the reconstruction of dynamic sequences. The idea behind this project is bringing known 2D computer vision and image processing algorithms, such as motion estimation and optical flow, to a third dimension.

Multi-view video synchronisation in the wild

Proposed by Rafael Pagés

This project will study the possibilities of multi-view video synchronisation. These kind of sequences are sometimes captured in real-life scenarios using hand-held mobile phones or other consumer cameras, which makes it specially difficult to synchronise. The key tasks of the project will be the analysis of different video synchronisation techniques (like audio, visual cues, feature points or motion analysis) and the implementation of these techniques on complicated datasets.

Palette Based Image Recolouring using Neural Networks – Please note this option is fully booked and no longer available to select.

Proposed by Mairéad Grogan

Image recolouring is an important application in computer vision, and can be used by photographers when editing images, as a post processing step in the film industry to change the look and feel of a scene, or by users of apps such as Instagram or Snapchat when they apply filters to images. Palette based image recolouring is a popular method for image recolouring. Given an input image, a palette of colours is generated that represents the colours in an image. The user can then edit this palette of colours to create the desired colour changes in the image. However, unexpected colour changes can be created, for example if a light colour is changed to a very dark colour, artifacts can be created. This project proposes using a neural network approach to learn the best recolouring result given the original image, it’s original palette and the new user defined palette.

PaletteNet: Image Recolorization with Given Color Palette, Junho Cho, Sangdoo Yun, Kyoungmu Lee, Jin Young Choi.  2017 IEEE Conference on  Computer Vision and Pattern Recognition Workshops (CVPRW) .
Link to pdf:

Extending soft segmentation for images to video sequences –  Please note this option is fully booked and no longer available to select.

Proposed by Mairéad Grogan

Decomposing an image into layers is an important tool when applying further processing to an image, and allows different segments of the image to be processed independently.  For example, recolouring can be applied to each layer independently, some image layers can be removed and others overlaid onto a new background etc. One soft segmentation approach that has been proposed recently decomposes an image into layers, with each layer associated with a dominant colour in the image [1]. For a given input image, n dominant colours in the image are first estimated. Each pixel in the image is then analysed, and it’s RGB colour defined as a linear combination of the n dominant colours. This is used to compute the contribution of each pixel to the n image layers.

While this approach has been investigated for images, its extension to video content has not been explored. This project would investigate ways that this soft-segmentation approach could be extended to video data. Temporal consistency of layers would need to be maintained, as well as the introduction of new colours, changing lighting, changing content etc.

Reference: Unmixing-Based Soft Color Segmentation for Image Manipulation, Yagiz Aksoy, Tunc Ozan Aydin, Aljosa Smolic, Marc Pollefeys, ACM Transactions on Graphics 2017.

Parallelization of low rank matrix completion algorithms

Proposed by Mikael Le Pendu

The problem of recovering a matrix from a subset of its entries assuming its rank is sufficiently low has received a great deal of attention in the past decade. A typical application that motivated the fast development of this research area is the Netflix problem [1], where users can rate a few movies and are given recommendations for other movies based on their own ratings and other users preferences. In this example, the data is arranged in a matrix whose columns contain all the users’ ratings for a given movie. Thanks to the correlations between the ratings among users, such a matrix is likely to satisfy the low rank property required for recovering the missing entries. Low rank matrix completion is also key to many other applications, including computer vision (e.g. background subtraction [2], face recognition [3], image inpainting [4]).

However, most algorithms for solving this problem are by nature sequential and too computationally expensive for large scale matrices. This project will aim to study such algorithms and adapt them for parallel processing to take advantage of GPU or multi-threading.

[1] ACM SIGKDD and Netflix. Proceedings of KDD Cup and Workshop, 2007
[2] S. E. Ebadi, V. G. Ones and E. Izquierdo, “Efficient background subtraction with low-rank and sparse matrix decomposition,” IEEE International Conference on Image Processing (ICIP), 2015.
[3] L. Ma, C. Wang, B. Xiao and W. Zhou, “Sparse representation for face recognition based on discriminative low-rank dictionary learning,” IEEE Conference on Computer Vision and Pattern Recognition, 2012.
[4] J. Liu, P. Musialski, P. Wonka, and J. Ye, “Tensor completion for estimating missing values in visual data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 208–220, Jan.2013.

Deep Spherical Harmonics

Proposed by Matis Hudon

Figure 1: Spherical Harmonics Representation

Figure 2: Convolutional Neural Network Example

In [RH01] it is shown that one only needs 9 spherical harmonics coefficients, corresponding to the lowest frequency modes of the illumination, to compute a diffuse shading with an error of 1%. On the other hand, Convolutional Neural Networks have proven to be very interesting and efficient image descriptors for classification. Recent works also use them to estimate 3D geometry directly from RGB images. The goal of this project is to make use of Deep Neural Networks and Convolutional Neural Networks to estimate the natural (unknown) lighting of a scene,  computing the coefficients of the 9 first spherical harmonics. The first part of the project is building a database of rendered images (Blender/Maya or other), and this database will then be used to train the CNN/DNN (Python/C++, Tensorflow/Caffe).

[RH01] Ravi Ramamoorthi and Pat Hanrahan. An efficient representation for irradiance environment maps. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 497–500.ACM, 2001. 22Data augmentation with artistic style

Data augmentation with artistic style

Proposed by: Sebastian Lutz, Tejo Chalasani, Koustav Ghosal

Input Image and 3 different styles. For humans it is still easy to detect that this is a lion.

4 different levels of abstraction for the same style input.

The success of training Deep Learning algorithms heavily depends on a large amount of annotated data. For many applications, gathering this data can be very time-consuming or difficult. For this reason, datasets are usually enhanced by data augmentation, i.e. applying random transformations on the data to enlarge the set. Often, only relatively simple transformations are applied e.g. random cropping or mirroring an image.

In Style Transfer[1], the goal is to apply the style of an image to another image without changing its content (See Figure). It is also possible to choose the level of abstraction when applying the style, i.e. choosing how much weight either the content or style has on the resulting image. Since the content of an image should stay the same after applying a new style to the image, it seems natural to use Style Transfer as a data augmentation strategy for image-based Deep Learning algorithms. The goal of this project is project is to explore how useful Style Transfer can be compared to and combined with the more traditional approaches, as well as analyse which styles and levels of abstraction work best.

[1] Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. “Image style transfer using convolutional neural networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Extending improvements in visual saliency estimation to omnidirectional images

Proposed by: Sebastian Lutz, Tejo Chalasani, Koustav Ghosal

The goal of visual saliency estimation is to predict which parts of an image humans are most likely to look at. There has been much research done in this area for traditional 2D images, but visual saliency estimation for omnidirectional images is still a new research direction. Recently, we introduced a Deep Learning architecture [1] that could be added to any Base CNN trained for traditional 2D images and that would improve the results for omnidirectional images. However, the Base CNN used in this paper is relatively basic and can not compare to the state-of-the-art in traditional 2D saliency estimation [2].

The goal of this project would be to extend our approach to work with a state-of-the-art CNN as base and incorporate other pre- and postprocessing steps that are used for traditional 2D saliency estimation to be used for omnidirectional images.

[1] Rafael Monroy, Sebastian Lutz, Tejo Chalasani and Aljosa Smolic. “SalNet360: Saliency Maps for omni-directional images with CNN.” arXiv preprint arXiv:1709.06505 (2017)
[2] Bylinskii, Zoya, et al. “MIT saliency benchmark.” (2015): 402-409.

AR Campus Tour

Proposed by: Sebastian Lutz, Tejo Chalasani, Koustav Ghosal

The idea of this project is to build an AR campus tour guide app. The basic idea is if a person uses their phone to look at an interesting site on campus, the app should recognise the place and pose of the object and overlay information about the monument/place. As an extension we can also overlay the path to the next monument or to the monument the user chooses. This project would involve

  • Understanding AR.
  • Understanding mobile technologies.
  • Understanding Tango / IOS AR toolkits
  • Understanding Image recognition, retrieval and pose matching.
  • Understanding simple path planning.

Automatic Irish Sign Language Recognition

Proposed by: Sebastian Lutz, Tejo Chalasani, Koustav Ghosal

CNN for gesture recognition has been applied for American Sign Language extensively. There has been some effort and research for doing automatic Irish Sign Language(ISL) detection [1], which looks into applying techniques like SVM, PCA, HMMs for ISL recognition. However applying deep learning for ISL hasn’t been researched to our knowledge. With the advent of good gesture recognition architectures in deep learning, it would be a good idea to apply the existing models to Irish Sign Language recognition. The aim of this project is to

  1. Understand Deep Learning Concepts
  2. Look into existing Gesture Recognition Models
  3. Learn tools like Caffe/TensorFlow/PyTorch.
  4. Collect Training and Testing Data for Irish Sign Language
    1. Getting the existing dataset from [1]
    2. Creating a dataset with help of the Irish Sign Language Club. Are the two options the student can explore in addition to any other ideas he/she can come up with.
  5. Applying the existing models to Irish Sign Language, and potentially publish results in the Irish Computer Vision Conference along with the dataset if created.


Camera Localisation using Deep Learning for Free View-point Video Applications.

Proposed by Konstantinos Amplianitis

Summary: Free view-point video (FVV) is a system for viewing an event or a performance, allowing the user to freely interact with the object in question by controlling the viewpoint and generating new views of dynamic scene from any known 3D position. For casual FVV applications (i.e, capturing a Juggler with your mobile devices), where the cameras are not static but rather hand-held, the position of the cameras used to reconstruct the object but also the background scene is crucial within the FVV pipeline. The goal of this master thesis is to apply deep learning for learning the visual representation of the scene and predict the position of the cameras within the scene (indoor and outdoor scenarios). The student should evaluate the predicted camera positions using existing structure-from-motion algorithms and propose potential improvements.


Creation of light field panoramas with a Lytro Illum camera

Proposed by Martin Alain, Sebastian Knorr

The goal of this project is to explore novel light field capture techniques, mainly panoramas, using a Lytro Illum camera.

Light fields capture all light rays passing through a given volume of space. Compared to traditional 2D imaging systems which capture the spatial intensity of the light rays, the 4D light fields also contain the angular direction of light rays. This additional information allows for multiple applications such as the reconstruction of the 3D geometry of a scene, creating new images from virtual point of view, or changing the focus of an image after it is captured. Light fields are also a growing topic of interest in the VR/AR community.

This project aims to capture 360 panorama light fields, similar to the work of [1,2] (see the links below for video examples), by mounting a Lytro Illum camera on a rotating platform, and stitching together the multiple light fields obtained. The panorama light fields obtained could then be used in our own applications, including refocusing, depth estimation, and light field rendering. Rendering of 360 light fields in HMD is of particular interest. Extra care is expected in choosing/designing the scenes to be captured, both from a scientific and artistic point of view.

This work could be extended to other capture configurations, e.g. time-lapses and time slices, or hybrid setup combining a Lytro Illum and a DSLR.

[1] Birklbauer, C. and Bimber, O., Panorama Light-Field Imaging, ACM Siggraph, 2012
[2] Birklbauer, C. and Bimber, O., Panorama Light-Field Imaging, In proceedings of Eurographics (Computer Graphics Forum), 33(2), 43-52, 2014

Related links:

Creation of a synthetic light field dataset

Proposed by: Martin Alain, David Hardman

The goal of this project is to create a novel synthetic light field dataset using blender.

Light fields capture all light rays passing through a given volume of space. Compared to traditional 2D imaging systems which capture the spatial intensity of the light rays, the 4D light fields also contain the angular direction of light rays. This additional information allows for multiple applications such as the reconstruction of the 3D geometry of a scene, creating new images from virtual point of view, or changing the focus of an image after it is captured. Light fields are also a growing topic of interest in the VR/AR community.

Synthetic light fields have been widely used in the community for their convenience compared to light field capture. One recent example of such dataset is the HCI dataset [1] (see the related links below), which uses blender to create light fields along with their ground truth depth map, hence creating a benchmark for depth map estimation from light field. In this project, novel camera configurations (in terms of position and density) going beyond the traditional array configuration used for light field capture should also be explored, and evaluated using typical light field applications, such as refocusing, depth estimation, and rendering. Extra care is expected in choosing/designing the virtual scenes to be rendered, both from a scientific and artistic point of view.

[1] Honauer, Katrin; Johannsen, Ole; Kondermann, Daniel; Goldlücke, Bastian, A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields, ACCV 2016

Related links:

360-degree video streaming for virtual reality

Proposed by Cagri Ozcinar

1) Viewport prediction for 360-degree video streaming using time-series forecasting models.

The goal of this project is to predict viewport position of the virtual reality headset using time-series forecasting. In this project, prediction performances for various time-series models, such as AR, MA, ARMA, etc., will be analysed using the gathered viewport trajectories.

Requirement Python or R

2) Quality of experience modeling for adaptive 360-degree video streaming.

The aim of this project is to build a quality model for 360-degree video streaming. For this purpose, available 360-degree video quality metrics will be integrated into the existing tool for traditional video and the gathered subjective scores for 360-degree video will be trained in order to model 360-degree video quality in adaptive streaming systems.

Requirements: Python

V-SENSE talk: Dr. Ronny Hänsch, Post-Doctoral researcher, TU Berlin

Speaker: Dr. Ronny Hänsch, Post-Doctoral researcher, TU Berlin

Venue: Maxwell Theatre in the Hamilton Building

Date & Time: Thursday 1st June at 11am

Title: From 2D to 3D – and back

Abstract: The work of the Computer Vision and Remote Sensing Department of the Technical University Berlin spans a wide range of research areas including traditional and more exotic methods of 3D reconstruction (i.e. TomoSAR – the estimation of 3D information from radar images), single image depth estimation, the usage of prior knowledge for shape completion and correction, shape abstraction, as well as object detection in 3D data.

After a short overview about recent works of the department, a small selection of methods is discussed in detail. The first part introduces a multi-view stereo approach with slightly changed data acquisition and pre-processing which leads to astonishing results over weakly textured areas. The second part addresses two examples to further process point cloud data: Near-real time object detection and shape abstraction. The last part of the talk covers an approach to evaluate structure from motion and multi-view stereo methods by synthetic images that – besides being photo-realistic – contain many image characteristics of real cameras.


Speaker Bio: Ronny Hänsch received the Diploma degree in computer science and the Ph.D. degree from the Technische Universität Berlin, Berlin, Germany, in 2007 and 2014 respectively. His research interests include computer vision, machine learning, object detection, neural networks, and Random Forests. He worked in the field of object detection and classification from remote sensing images, with a focus on polarimetric synthetic aperture radar images. His recent research interests focus on the development of probabilistic methods for 3D reconstruction by structure from motion as well as ensemble methods for image analysis.

The scenographic turn: the pharmacology of the digitisation of scenography


Néill O’Dwyer


Theatre has long been acknowledged as a space that facilitates collaboration, not
only between different fields of art, but also between the arts and sciences.
Scenography, through its rationalisation of the performance space, is the territory
where these collaborations are played out, and is therefore the leading field to
employ digital technologies. Increased computing power has introduced new
opportunities for digital simulation – already widely exploited in the film and
gaming industries – on the live stage. This article focuses on the shift taking place
in the performing arts, brought about by the migration from mechanical to digital
technologies and the import of software into scenographic working processes; this is my definition of the scenographic turn. What are the repercussions of this turn aesthetically, socially and politically? As a catalyst for discussion I will analyse the Australian-based performing arts collective Chunky Move’s digitally engaged performance, Mortal Engine, which is a quintessential example exhibiting the new possibilities made available to live performance by digital technologies. In doing so, this article considers the new technohistoric specificities of the work in relation to the concept of machine as performer, thus highlighting a certain development in working processes that facilitates an increase in machines’ responsibility and intention in live performance. This logically also demands a discussion of how technology impacts on the formulation of choreography and dramaturgy, and what this means for working processes going forward. The technological philosophy of Bernard Stiegler is engaged to help unpack and reflect upon the sociohistoric, ontological and cultural implications of such technologically engaged productions.

Download Paper: The scenographic turn: the pharmacology of the digitisation of scenography

Colour Transfer using the L2 Metric


Colour transfer is an important pre-processing step in many applications, including stereo vision, surface reconstruction and image stitching. It can also be applied to images and videos as a post processing step to create interesting special effects and change their tone or feel. While many software tools are available to professionals for editing the colours and tone of an image, bringing this type of technology into the hands of everyday users, with an interface that is intuitive and easy to use, has generated a lot of interest in recent years.

One approach often used for colour transfer is to allow the user to provide a palette image which has the desired colour distribution, and use it to transfer the desired colour feel to the original target image. This approach allows the user to easily generate the desired colour transfer result without the need for user interaction.

Demo: Colour Transfer Using the L2 metric

It has recently been shown that the L2 metric can be used to create good colour transfer results when the user provides a palette image for recolouring [1]. This technique proposes to model the colour distribution of the target and palette images using Gaussian Mixture Models (GMMs) and registers these GMMs to compute the colour transfer function that maps the colours of the palette image to the target image. It has been shown to outperform other state of the art colour transfer techniques, and can be easily extended to video content. A demo of this colour transfer technique is available here:

In the V-Sense project we are investigating ways to extend this L2 based colour transfer approach to other applications, finding areas in which this robust metric could prove advantageous.



[1] Robust Registration of Gaussian Mixtures fro Colour Transfer , Mairéad Grogan and Rozenn Dahyot, ArXiv, May 2017.




Demo: Visual Attention for Omnidirectional Images in VR Applications


Understanding visual attention has always been a topic of great interest in different research communities. This is particularly important in omnidirectional images (ODIs) viewed with a head-mounted display (HMD), where only a fraction of the captured scene is displayed at a time, namely viewport.

Here, we share a demo that displays a set of ODIs (provided by the user or using the ones available), while it collects the viewport’s center position at every animation frame for each ODI. The data collected is automatically downloaded at the end of the session.



Croci, Simone; Knorr, Sebastian; Smolic, Aljosa

Saliency-Based Sharpness Mismatch Detection For Stereoscopic Omnidirectional Images Inproceedings Forthcoming

14th European Conference on Visual Media Production, London, UK, Forthcoming.

Abstract | BibTeX

Croci, Simone; Knorr, Sebastian; Goldmann, Lutz; Smolic, Aljosa

A Framework for Quality Control in Cinematic VR Based on Voronoi Patches and Saliency Inproceedings Forthcoming

International Conference on 3D Immersion, Brussels, Belgium, Forthcoming.

Abstract | BibTeX

Monroy, Rafael; Lutz, Sebastian; Chalasani, Tejo; Smolic, Aljosa

SalNet360: Saliency Maps for omni-directional images with CNN Unpublished


Abstract | Links | BibTeX

Abreu, Ana De; Ozcinar, Cagri; Smolic, Aljosa

Look around you: saliency maps for omnidirectional images in VR applictions Inproceedings

9th International Conference on Quality of Multimedia Experience (QoMEX), 2017.

Abstract | Links | BibTeX

Dr Laura Toni talk Invitation: Wednesday 26th April at 2pm

Speaker: Dr Laura Toni, Lecturer University College London

Venue: Large Conference Room, O’Reilly Institute

Date & Time: Wednesday 26th April at 2pm

Title: Navigation-Aware Communications for Interactive Multiview Video Systems

Abstract: Recent advances in video technology have moved the research toward novel interactive Multiview services, such as 360-degree videos, and virtual reality, where users can actively interact with the scene. Because of their interactivity, users are no longer seen as mere terminals interconnected by links but active players in the communication with all possible levels of interactivity: from passively consuming contents, to actively crafting ones own media stream and social experience. This user-centric paradigm calls for adaptation of streaming policies to both the nature of the content of the communication and the social dynamics among users to face the astonishing diversity of novel networks.

In this talk, we will provide an overview on adaptive communication paradigms that need to be designed to cope with both the massive traffic of Multiview data and the interactivity level of the users. Then, we will describe in more details novel navigation-aware frameworks for optimal coding, streaming, and Multiview processing in adaptive streaming processes. We conclude with a perspective on open challenges in the field on 360-videos, stressing in particular the need to learn in real time users’ behaviours to optimally design future interactive streaming systems

Speaker Bio: Laura Toni received the M.S. and Ph.D. degrees in electrical engineering, from the University of Bologna, Italy, in 2005 and 2009, respectively. In 2007, she was a visiting scholar at the University of California at San Diego (UCSD), CA, and since 2009, she has been a frequent visitor to the UCSD, working on media coding and streaming technologies.

Between 2009 and 2011, she worked at the Tele-Robotics and Application (TERA) department at the Italian Institute of Technology (IIT), investigating wireless sensor networks for robotics applications. In 2012, she was a Post-doctoral fellow at the UCSD and between 2013 and 2016 she was a Post-doctoral fellow in the Signal Processing Laboratory (LTS4) at the Swiss Federal Institute of Technology (EPFL), Switzerland.

Since July 2016, she has been appointed as Lecturer in the Electronic and Electrical Engineering Department of University College London (UCL), UK.

Her research mainly involves interactive multimedia systems, decision-making strategies under uncertainty, large-scale signal processing and communications.