Protein solvent accessibility prediction using support vector machines and sequence conservations

Ogul, Hasan; Mumcuoglu, ÜNAL

Protein solvent accessibility prediction using support vector machines and sequence conservations

Atıf İçin Kopyala

Ogul H., Mumcuoglu E. U.

ARTIFICIAL INTELLIGENCE AND NEURAL NETWORKS, cilt.3949, ss.141-148, 2006 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 3949
Basım Tarihi: 2006
Dergi Adı: ARTIFICIAL INTELLIGENCE AND NEURAL NETWORKS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, EMBASE, MathSciNet, Philosopher's Index, zbMATH
Sayfa Sayıları: ss.141-148
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

A two-stage method is developed for the single sequence prediction of protein solvent accessibility from solely its amino acid sequence. The first stage classifies each residue in a protein sequence as exposed or buried using support vector machine (SVM). The features used in the SVM are physicochemical properties of the amino acid to be predicted as well as the information coming from its neighboring residues. The SVM-based predictions are refined using pairwise conservative patterns, called maximal unique matches (MUMs). The MUMs are identified by an efficient data structure called suffix tree. The baseline predictions, SVM-based predictions and MUM-based refinements are tested on a nonredundant protein data set and similar to 73% prediction accuracy is achieved for a solvent accessibility threshold that provides an evenly distribution between buried and exposed classes. The results demonstrate that the new method achieves slightly better accuracy than recent methods using single sequence prediction.