Thesis Type: Postgraduate
Institution Of The Thesis: Middle East Technical University, Graduate School of Natural and Applied Sciences, Turkey
Approval Date: 2019
Thesis Language: English
Student: KADER BELLİ
Principal Supervisor (For Co-Supervisor Theses): Emre Akbaş
Co-Supervisor: Adnan Yazıcı
Abstract:The analysis of lifelogging has generated great interest among data scientists because large-scale, multidimensional and multimodal data are generated as a result of lifelogging activities. In this study, we use the NTCIR Lifelog dataset where daily lives of two users are monitored for a total of 90 days, and archived as a set of minute-based records consisting of details like semantic location, body measurements, listening history, and user activity. In addition, images which are captured automatically by cameras located at users' chests are available for each minute together with text annotations, which promotes the multimodal nature of the dataset. We train and evaluate several classification methods on the text and image data separately, and on their combination as well. Specifically, for text data, we encode the words using a one-hot encoding, and train SVM and MLP models on bag-of-words representations of minutes. For image data, we train two different convolutional neural networks (CNN) in two different ways: training from scratch and fine-tuning an ImageNet pre-trained model. Finally, we propose a multi-loss, combined CNN-MLP model which processes image and text data simultaneously, uses fusion methods to merge the two sub-models, and can handle missing input modalities. We also put effort into a contribution to the NTCIR LifeLog dataset by manually labeling 90,000 images into 16 activity classes