Human behavior understanding using video analysis


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Elektrik ve Elektronik Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2016

Öğrenci: CELAL ONUR GÖKÇE

Danışman: UĞUR HALICI

Özet:

In this study we proposed a new hierarchical architecture for solution of human behavior understanding problem. A new dataset, namely football video game (FVG) dataset, is generated which involves activities more complex than any other dataset present in the literature. Football ball is detected using multilayer neural networks trained with gradient descent algorithm and enhanced using learning with queries method and it is tracked using growing window algorithm. After region of interest is extracted around ball detected and tracked, primitive action is recognized using one of three types of approaches. First one is based on the well-known Dollar et.al. cuboid features.The second one is mixture of poses approach proposed in this study. Third is an extension to mixture of poses, where fisher vector is employed for the representation of mixture of poses in vectorel form. Primitive action sequences found sequentially in this layer are fed to higher layer activity recognition layer. In the activity recognition layer one of two types of approaches are used. One is well known Hidden Markov Model (HMM), known to work well on time series data and the other is Context Free Grammar (CFG) which can theoretically recognize more complex sequences that HMM can not. Another novelty of this study is new activity types are learnt using grammar induction with the Cook, Yunger and Kasami (CYK) algorithm. So, this way either pre-taught activity types can be recognized or new activity types can be learnt. The FVG dataset that we generated is tested with four combinations of approaches and encouraging results are obtained. For primitive action recognition, the proposed MoP algorithm has success rate of 69.3%, clearly exceeding widely referenced Dollar et. al. cuboid features which achieves success rate of 54.0%. Employing vi Fisher vector with MoP slightly decreases performance to 67.0 % while achieving high speed. For complex activity recognition, MoP-CFG pair achieves success rate of 62.5%, same with MoP-HMM pair. They clearly exceed cuboid-CFG/HMM pairs which achieves success rate of 42.5%.There are activities those can be recognized by CFG but not by HMM. An example for these is passing the ball around activity.