Tezin Türü: Yüksek Lisans
Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye
Tezin Onay Tarihi: 2021
Tezin Dili: İngilizce
Öğrenci: ENGİN FIRAT
Danışman: Emre Akbaş
Özet:
In this thesis work, we studied combining an object tracker, which uses siamese networks, with another model that is trained by using the self-supervised learning paradigm. We define grayscale video colorization as a pretext task for self-supervised learning and we select the similarity based object tracking as a downstream task. Both the siamese network based object tracker and the colorization network model use the similarity between subsequent video frames. The spatio-temporal coherence between the frames of a video enables the network to learn this similarity. We study different ways of combining the two networks. Since colorization framework uses similarity learning as its basis, we cross correlate output features of colorization network as in siamese network based tracker. Then, we combine two different methods by taking the weighted average of their score maps in order to obtain a combined score map. We search for the optimal value of this weight by conducting several experiments. In addition, we conducted experiments with different neural network architectures for the colorization framework. Our experimental results show that utilizing the self-supervised pretext task improves the overall success rate when the combined network is further trained in a supervised manner. In addition, we also show that self-supervised video colorization network offers an alternative way for using modern and deeper networks in siamese architectures by alleviating the strict translational invariance restriction needed by siamese architectures.