Towards finding optimal mixture of subspaces for data classification


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2003

Öğrenci: MOHAMED ELHAFIZ MUSTAFA MUSA

Danışman: MEHMET VOLKAN ATALAY

Özet:

In pattern recognition, when data has different structures in different parts of the input space, fitting one global model can be slow and inaccurate. Learning methods can quickly learn the structure of the data in local regions, consequently, offering faster and more accurate model fitting. Breaking training data set into smaller subsets may lead to curse of dimensionality problem, as a training sample subset may not be enough for estimating the required set of parameters for the submodels. Increasing the size of training data may not be at hand in many situations. Interestingly, the data in local regions becomes more correlated. Therefore, by decorrelation methods we can reduce data dimensions and hence the number of parameters. In other words, we can find uncorrelated low dimensional subspaces that capture most of the data variability. The current subspace modelling methods have proved better performance than the global modelling methods for the given type of training data structure. Nevertheless these methods still need more research work as they are suffering from two limitations 2 There is no standard method to specify the optimal number of subspaces. ² There is no standard method to specify the optimal dimensionality for each subspace. In the current models these two parameters are determined beforehand. In this dissertation we propose and test algorithms that try to find a suboptimal number of principal subspaces and a suboptimal dimensionality for each principal subspaces automatically.