Human action recognition for various input characteristics using 3 dimensional residual networks


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Fen Bilimleri Enstitüsü, Türkiye

Tezin Onay Tarihi: 2019

Tezin Dili: İngilizce

Öğrenci: GÜLİN TÜFEKCİ

Danışman: İlkay Ulusoy

Özet:

Action recognition using deep neural networks is a far-reaching research area which has been commonly utilized in applications such as statistical analysis of human behavior, detecting abnormalities using surveillance cameras and robotic systems. Previous studies have been performing researches to propose new machine learning algorithms and deep network architectures to obtain higher recognition accuracy levels. Instead of suggesting a network resulting in small accuracy gain, this thesis focuses on evaluating different input characteristics for increasing the learning capacity of the networks. To do so, 3-dimensional residual networks are utilized because of their effective learning process. Among all the modifications applied on the inputs, increasing the sample duration up to 60 frames and masking the RGB pixel values with the motion flow between consecutive frames provide high accuracy gains. Employing 60 frames instead of 16 frames quadruples the computation time while achieving an accuracy increase of 10%. Masking the frames results in 12% recognition accuracy gain. Both modifications contribute to the learning process of the network by emphasizing the relations between patterns through longer temporal extents and guiding the network to focus on the areas where the main action takes place. Obtaining significant amounts of accuracy gains by only modifying the input is outstanding. Moreover, the recognition accuracy is enhanced even more by pre-training the network on a large scale dataset. The contributions of the results of this thesis are worthwhile since the input characteristics yielding high accuracy gains can be used for different networks to increase the recognition accuracy.