Discriminative remote homology detection using maximal unique sequence matches


OGUL H., Mumcuoglu U.

ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS, cilt.3646, ss.283-292, 2005 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 3646
  • Basım Tarihi: 2005
  • Dergi Adı: ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS
  • Sayfa Sayıları: ss.283-292

Özet

We define a new pairwise sequence comparison scheme, for distantly related proteins and report its performance on remote homology detection task. The new scheme compares two protein sequences by using the maximal unique matches (MUM) between them. Once identified, the length of all nonoverlapping MUMs is used to define the simflarity between two sequences. To detect the homology of a protein to a protein family, we utilize the feature vectors containing all pairwise similarity scores between the test protein and the proteins in the training set. Support vector machines are employed for the binary classification in the same way that the recent works have done. The new method is shown to be more accurate than the recent methods including SVM-Fisher and SVM-BLAST, and competitive with SVM-Pairwise. In terms of computational efficiency, the new method performs much better than SVM-Pairwise.