Automated learning rate search using batch-level cross-validation

Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Fen Bilimleri Enstitüsü, Türkiye

Tezin Onay Tarihi: 2019

Tezin Dili: İngilizce

Öğrenci: DUYGU KABAKCI

Danışman: Emre Akbaş

Özet:

Deep convolutional neural networks are being widely used in computer vision tasks, such as object recognition and detection, image segmentation and face recognition, with a variety of architectures. Deep learning researchers and practitioners have accumulated a significant amount of experience on training a wide variety of architectures on various datasets. However, given a specific network model and a dataset, obtaining the best model (i.e. the model giving the smallest test set error) while keeping the training time complexity low is still a challenging task. Hyper-parameters of deep neural networks, especially the learning rate and its (decay) schedule, highly affect the network’s final performance. The general approach is to search the best learning rate and learning rate decay parameters within a cross-validation framework, a process that usually requires a significant amount of experimentation with extensive time cost. In classical cross-validation, a random part of the dataset is reserved for the evaluation of model performance on unseen data. This technique is usually run multiple times to decide learning rate settings with random validation sets. This thesis is aimed at exploring batch-level cross-validation methods as an alternative to the v classical dataset-level, hence macro, CV. The advantage of micro CV methods is that the gradient computed during training is re-used to evaluate several different learning rates. We propose automated learning rate selection algorithms that are aimed to address setting the learning rate and learning rate schedule during training. Our algorithms use micro cross-validation where a random half of the current batch (of examples) is used for training and the other half is used for validation. We present comprehensive experimental results on three well-known datasets (CIFAR10, SVHN and ADIENCE) using three different network architectures: a custom CNN, ResNet and VGG.