Double network superresolution

Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Fen Bilimleri Enstitüsü, Türkiye

Tezin Onay Tarihi: 2019

Tezin Dili: İngilizce

Öğrenci: CEM TARHAN

Danışman: Gözde Akar

Özet:

As the social platforms became widespread, the image and video based materials are being shared continuously and increasingly each day. This not only brings an issue of storage but also internet bandwidth usage. In order for a user to effectively run a superresolution (SR) algorithm on a mobile device, a light-weight but good performing algorithm must be designed. In recent years, convolutional neural networks (CNNs) have been widely used for SR. Although their indisputable success, CNNs lack proper mathematical background on how and what they learn. In the first part of the thesis we prove that CNN elements act as inverse problem solvers that are optimal for the purpose. We show that the learned coefficients of a network obey a concept namely Representation-Dictionary Duality. We show the necessity of skip connections for convergence of the network. The demand for high computational load for state of the art algorithms renders them unusable on a mobile platform. In the second part of the thesis, we propose a novel double network superresolution (DNSR) algorithm that requires dramatically low number of parameters. We propose the usage of two networks, trained with disjunct data. One network is responsible from reconstructing sharp transitions in an image where the other network is specialized for texture reconstruction. DNSR is not only able to learn SR solution with practically feasible number of operations but also able to obtain superior performance on the reconstruction of high frequency details with high fidelity. Finally, we propose a Detail Fusion Interpolator (DFI), that combines optical flow estimation and motion compensation blocks within a small network. By extending DNSR to multi-frame approaches we compare its performance to state of the art Video SR algorithms and to single frame DNSR. We show that DFI is indeed able to compensate for motion and combined system performs better than single frame approach.