Similarity search in protein sequence databases using metric access methods

Cetintas A., Sacan A., TOROSLU İ. H.

5th International Conference on Bioinformatics and Computational Biology 2013, BICoB 2013, Honolulu, HI, Amerika Birleşik Devletleri, 4 - 06 Mart 2013, ss.131-136, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Basıldığı Şehir: Honolulu, HI
Basıldığı Ülke: Amerika Birleşik Devletleri
Sayfa Sayıları: ss.131-136
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

The rapid increase in the size of biological sequence data owing to the advancements in high-throughput sequencing techniques, and the increased complexity of hypothesis-driven exploration of this data requiring massive number of similarity queries call for new approaches for managing sequence databases and analysis of this information. The metric space representation for sequences is suitable for similarity search and provides several sophisticated metric-indexing techniques. In this work, we provide a thorough survey and analysis of the application of metric access methods to similarity search in protein sequence databases. A framework supporting application of different metric space indexing methods is developed and a non-redundant sequence database is used to benchmark different methods in terms of number of distance-computations incurred and the computation time required during database compilation and query phases. The parameters of each method are optimized on a subset of experimental conditions. We demonstrate that Onion-Tree, a hybrid metric access method, performs the best in both index building and querying phases for the protein database investigated, and scales well for large databases, incurring distance computations with 0.5% of the database sequences per query.