Subcellular localization prediction with new protein encoding schemes

Ogul, Hasan; Mumcuoglu, ÜNAL

doi:10.1109/tcbb.2007.070209

Subcellular localization prediction with new protein encoding schemes

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, cilt.4, sa.2, ss.227-232, 2007 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 4 Sayı: 2
Basım Tarihi: 2007
Doi Numarası: 10.1109/tcbb.2007.070209
Dergi Adı: IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.227-232
Anahtar Kelimeler: n-peptide composition, probabilistic suffix tree, subcellular localization, support vector machines, SUPPORT VECTOR MACHINES, SVM, EVOLUTIONARY, SIMILARITY, LOCATIONS, PSORT
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Subcellular localization is one of the key properties in functional annotation of proteins. Support vector machines (SVMs) have been widely used for automated prediction of subcellular localizations. Existing methods differ in the protein encoding schemes used. In this study, we present two methods for protein encoding to be used for SVM-based subcellular localization prediction: n-peptide compositions with reduced amino acid alphabets for larger values of n and pairwise sequence similarity scores based on whole sequence and N-terminal sequence. We tested the methods on a common benchmarking data set that consists of 2,427 eukaryotic proteins with four localization sites. As a result of 5-fold cross-validation tests, the encoding with n-peptide compositions provided the accuracies of 84.5, 88.9, 66.3, and 94.3 percent for cytoplasmic, extracellular, mitochondrial, and nuclear proteins, where the overall accuracy was 87.1 percent. The second method provided 83.6, 87.7, 87.9, and 90.5 percent accuracies for individual locations and 87.8 percent overall accuracy. A hybrid system, which we called PredLOC, makes a final decision based on the results of the two presented methods which achieved an overall accuracy of 91.3 percent, which is better than the achievements of many of the existing methods. The new system also outperformed the recent methods in the experiments conducted on a new-unique SWISSPROT test set.