End-to-end learned image compression with conditional latent space modelling for entropy coding

Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Elektrik ve Elektronik Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2019

Tezin Dili: İngilizce

Öğrenci: AZİZ BERKAY YEŞİLYURT

Danışman: Fatih Kamışlı

Özet:

This thesis presents a lossy image compression system based on an end-to-end train-able neural network. Traditional compression algorithms use linear transformation,quantization and entropy coding steps that are designed based on simple models ofthe data and are aimed to be low complexity. In neural network based image com-pression methods, the processing steps, such as transformation and entropy coding,are performed using neural networks. The use of neural networks enables transformsor probability models for entropy coding that can optimally process or represent datawith much more complex dependencies instead of simple models, all at the expenseof higher computational complexity than traditional methods.One major line of work on neural network based lossy image compression uses anautoencoder-type neural network for the transform and inverse transform of the com-pression system. The quantization of the latent variables, i.e. transform coefficients,and the arithmetic coding of the quantized latent variables are done with traditionalmethods. However, the probability distribution of the latent variables, which the arith-metic encoder works with, is represented also with a neural network. Parameters of all neural networks in the system are learned jointly from a training set of real imagesby minimizing the rate-distortion cost.

One major work assumes the latent variables in a single channel (i.e. feature mapor signal band) are independent and learns a single distribution model for each chan-nel. The same authors then extend their work by incorporating a hyperprior neuralnetwork to capture the dependencies in the latent representation and improve the com-pression performance significantly. This thesis uses an alternative method to exploitthe dependencies of the latent representation. The joint density of the latent repre-sentation is modeled as a product of conditional densities, which are learned usingneural networks. However, each latent variable is not conditioned on all previous la-tent variables as in the Chain rule of factoring joint distributions, but only on a fewprevious variables, in particular the left, upper and upper-left spatial neighbors of thatlatent variable based on Markov property assumption. The compression performanceis on par with the hyperprior based work, but the conditional densities require a muchsimpler network than the hyperprior network in the literature. While the conditionaldensities require much less training time due to their simplicity and less number ofparameters than the hyperprior based neural network, their inference time is longer.