Robust feature space separation for deep convolutional neural network training

Sekmen, Ali; Parlaktuna, Mustafa; Abdul-Malek, Ayad; Erdemir, Erdem; Koku, AHMET

doi:10.1007/s44163-021-00013-1

Robust feature space separation for deep convolutional neural network training

Atıf İçin Kopyala

Sekmen A., Parlaktuna M., Abdul-Malek A., Erdemir E., Koku A. B.

Discover Artificial Intelligence, cilt.1, sa.12, ss.1-11, 2021 (Hakemli Dergi)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 1 Sayı: 12
Basım Tarihi: 2021
Doi Numarası: 10.1007/s44163-021-00013-1
Dergi Adı: Discover Artificial Intelligence
Sayfa Sayıları: ss.1-11
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

This paper introduces two deep convolutional neural network training techniques that lead to more robust feature subspace separation in comparison to traditional training. Assume that dataset has M labels. The first method creates M deep convolutional neural networks called {DCNNi}i=1M" role="presentation">. Each of the networks DCNNi" role="presentation"> is composed of a convolutional neural network (CNNi" role="presentation">) and a fully connected neural network (FCNNi" role="presentation">). In training, a set of projection matrices {Pi}i=1M" role="presentation"> are created and adaptively updated as representations for feature subspaces {Si}i=1M" role="presentation">. A rejection value is computed for each training based on its projections on feature subspaces. Each FCNNi" role="presentation"> acts as a binary classifier with a cost function whose main parameter is rejection values. A threshold value ti" role="presentation"> is determined for ith" role="presentation"> network DCNNi" role="presentation">. A testing strategy utilizing {ti}i=1M" role="presentation"> is also introduced. The second method creates a single DCNN and it computes a cost function whose parameters depend on subspace separations using the geodesic distance on the Grasmannian manifold of subspaces Si" role="presentation"> and the sum of all remaining subspaces {Sj}j=1,j≠iM" role="presentation">. The proposed methods are tested using multiple network topologies. It is shown that while the first method works better for smaller networks, the second method performs better for complex architectures.