Automated learning rate search using batch-level cross-validation


Creative Commons License

KABAKÇI D., AKBAŞ E.

Sakarya University Journal of Computer and Information Sciences (Online), cilt.4, sa.3, ss.312-325, 2021 (Hakemli Dergi) identifier

Özet

Deep learning researchers and practitioners have accumulated a significant amount of experience on training a wide variety of architectures on various datasets. However, given anetwork architecture and a dataset, obtaining the best model (i.e. the model giving the smallest test set error) while keeping the training time complexity low is still a challenging task. Hyper-parameters of deep neural networks, especially the learning rate and its (decay) schedule, highly affect the network's final performance. The general approach is to search the best learning rate and learning rate decay parameters within a cross-validation framework, a process that usually requires a significant amount ofexperimentation with extensive time cost. In classical cross-validation (CV), a random part of the dataset is reserved for the evaluation of model performance on unseen data. This technique is usually run multiple times to decide learning rate settings with random validation sets. In this paper, we explorebatch-level cross-validation as an alternative to the classical dataset-level, hence macro, CV. The advantage of batch-level or micro CV methods is that the gradient computed during training is re-used to evaluate several different learning rates. We propose an algorithm based on micro CV and stochastic gradient descent with momentum, which produces a learning rate schedule during trainingby selectinga learning rate per epoch,automatically. In our algorithm, a random half of the current batch (of examples) is used for training and the other half is used for validating several different step sizes or learning rates. We conductedcomprehensive experimentson three datasets (CIFAR10, SVHN and Adience) using three different network architectures (a custom CNN, ResNet and VGG) to compare the performances of our micro-CV algorithm and thewidely used stochastic gradient descent with momentum in a early-stopping macro-CV setup. The results show that, our micro-CV algorithm achieves comparable test accuracy to macro-CV witha much lower computational cost.