A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabets


OĞUL H., Mumcuoglu E. U.

BIOSYSTEMS, cilt.87, sa.1, ss.75-81, 2007 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 87 Sayı: 1
  • Basım Tarihi: 2007
  • Doi Numarası: 10.1016/j.biosystems.2006.03.006
  • Dergi Adı: BIOSYSTEMS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.75-81
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

In this study, n-peptide compositions are utilized for protein vectorization over a discriminative remote homology detection framework based on support vector machines (SVMs). The size of amino acid alphabet is gradually reduced for increasing values of n to make the method to conform with the memory resources in conventional workstations. A hash structure is implemented for accelerated search of n-peptides. The method is tested to see its ability to classify proteins into families on a subset of SCOP family database and compared against many of the existing homology detection methods including the most popular generative methods; SAM-98 and PSI-BLAST and the recent SVM methods; SVM-Fisher, SVM-BLAST and SVM-Pairwise. The results have demonstrated that the new method significantly outperforms SVM-Fisher, SVM-BLAST, SAM-98 and PSI-BLAST, while achieving a comparable accuracy with SVM-Pairwise. In terms of efficiency, it performs much better than SVM-Pairwise. It is shown that the information of n-peptide compositions with reduced amino acid alphabets provides an accurate and efficient means of protein vectorization for SVM-based sequence classification. (c) 2006 Elsevier Ireland Ltd. All rights reserved.