Pattern search in pathogenic bacterial proteins for localization and secretory systems


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Fen Edebiyat Fakültesi, Biyolojik Bilimler Bölümü, Türkiye

Tezin Onay Tarihi: 2015

Öğrenci: ORHAN ÖZCAN

Eş Danışman: TOLGA CAN, GÜLAY ÖZCENGİZ

Özet:

Computational prediction of bacterial protein localization (BPL) is a very useful tool which provides clues about protein function. For pathogenic proteins in particular, detection of their subcellular location and their secretory pathways have great implications for vaccine and drug design. Cell surface and/or secreted proteins of microbes can also be used as biomarkers for sensor applications. At present, there are numerous BPL prediction algorithms and programs available, however, most of them give false positive results in order to maximize the number of positive predictions. Moreover, state of the art algorithms, specifically PSORT, successfully identify protein localization for every organism from any given sequence information but they usually fail in pathogenic sequences. Because the most of the pathogenic proteins are surface-localized, there is an imminent need for pathogen-specific secretion motif search algorithms as well. These motifs would also provide information on bacterial protein localization. In the present work, we built databases of pathogenic sequences and searched for selected 5 to 18 amino acid long motifs as a new approach, namely Pathogenic Sequence Motif Search (PSMS). The algorithm is based on a total of 52 distinct secretion-associated patterns covering 6 different secretory pathways for the prediction of surface and secreted proteins. The datasets for each of the following groups of proteins were next established for our validation studies which involved the tests for the success rate of these 52 patterns: Secreted, immunoreactive and patented vaccine, cytoplasmic and orphan-secreted with 3241, 1740, 2582 and 2533 members, respectively. A total of 3241 proteins in secreted proteins dataset represented TISSS, T2SS, T3SS, T4SS, T5SS and T6SS systems of secretion with 954, 668, 381, 770, 221 and 274 protein sequences, respectively. Cytoplasmic protein dataset, on the other hand, was used to exclude certain candidate patterns. 43 out of 52 patterns were truly secretion-related, pointing directly to a specific secretion system. Rest 9 patterns were found in secreted proteins though not related to a specific secretion system. Additionaly, LC-MS data formerly obtained in our laboratories from Bordetella pertussis surface proteome and secretome analyses were also included in the secreted protein sequence dataset. The selected patterns were demonstrated for instance in 503 out of a total of 1740 proteins in the immunoreactive protein dataset. With the help of our patterns, 75 proteins which were formerly predicted to have an intracellular localization and mistakenly ruled out as potential drug targets/vaccine candidates were successfully predicted as surface- associated/secreted ones. Besides the development of PSMS program predicting pathogenic sequences with high accuracy, the separate databases constructed in this work with respect to immunoreactivity and distinct secretory pathways are expected to constitute valuable bioinformatics resources for researchers of the field.