ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS, cilt.3646, ss.283-292, 2005 (SCI-Expanded)
We define a new pairwise sequence comparison scheme, for distantly related proteins and report its performance on remote homology detection task. The new scheme compares two protein sequences by using the maximal unique matches (MUM) between them. Once identified, the length of all nonoverlapping MUMs is used to define the simflarity between two sequences. To detect the homology of a protein to a protein family, we utilize the feature vectors containing all pairwise similarity scores between the test protein and the proteins in the training set. Support vector machines are employed for the binary classification in the same way that the recent works have done. The new method is shown to be more accurate than the recent methods including SVM-Fisher and SVM-BLAST, and competitive with SVM-Pairwise. In terms of computational efficiency, the new method performs much better than SVM-Pairwise.