SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees

Ogul, Hasan; Mumcuoglu, ÜNAL

doi:10.1016/j.compbiolchem.2006.05.001

SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees

Ogul H., Mumcuoglu E. U.

COMPUTATIONAL BIOLOGY AND CHEMISTRY, cilt.30, sa.4, ss.292-299, 2006 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 30 Sayı: 4
Basım Tarihi: 2006
Doi Numarası: 10.1016/j.compbiolchem.2006.05.001
Dergi Adı: COMPUTATIONAL BIOLOGY AND CHEMISTRY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.292-299
Anahtar Kelimeler: family classification, probabilistic suffix tree, sequence similarity, support vector machine, SEQUENCE, SIMILARITY, DATABASE, SEARCH
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared with existing discriminative methods SVM-BLAST and SVM-Pairwise, which use BLAST similarity scores and dynamic-programming-based alignment scores, respectively. Results have shown that SVM-PST is more accurate than SVM-BLAST and competitive with SVM-Pairwise. In terms of computational efficiency, PST-based comparison is much better than dynamic-programming-based alignment. We also compared our results with the original family-based PST approach from which we were inspired. The present method provides a significantly better solution for protein classification in comparison with the family-based PST model. (c) 2006 Elsevier Ltd. All rights reserved.