PREDICTION OF TRANSMEMBRANE REGIONS OF G PROTEIN-COUPLED RECEPTORS USING MACHINE LEARNING TECHNIQUES


Thesis Type: Postgraduate

Institution Of The Thesis: Middle East Technical University, Graduate School of Natural and Applied Sciences, Turkey

Approval Date: 2019

Thesis Language: English

Student: MUAZZEZ ÇELEBİ ÇINAR

Principal Consultant (For Co-Consultant Theses): Çağdaş Devrim Son

Co-Consultant: Tolga Can

Abstract:

G protein-coupled receptors (GPCRs) are one of the largest and the most significant membrane receptor families in eukaryotes. They transmit extracellular stimuli to the inside of the cell by undergoing conformational changes. GPCRs can recognize a diversity of extracellular ligands including hormones, neurotransmitters, odorants, photons, and ions. These receptors are associated with a variety of diseases in hu-mans such as cancer and central nervous system disorders, and can be proclaimed as one of the most important targets for the pharmaceutical industry. They have seven transmembrane helices that contain essential regions such as ligand binding sites, ac-tuator protein (e.g. G protein) binding sites and cholesterol binding sites. There is a large gap in topology data for membrane proteins due to the experimental limita-tions resulting from unstability of the membrane. In UniProt, which is a freely avail-able database of protein sequences and structural and functional information, only 29 GPCRs among the thousands have experimentally solved transmembrane (TM) re-gion data. The topology information of other membrane proteins is provided using the TMHMM prediction tool, which is based on hidden Markov models. However, it incorrectly predicts the total number of TM regions for 6 of the 29 experimentally de-termined GPCRs. With this study, we try to develop a GPCR-specific TM prediction algorithm using machine learning techniques. The algorithm is based on hydropho-bicity of each amino acid in the protein sequence and the secondary structure. As hydrophobicity scale, both Moon-Fleming and Kyte-Doolittle hydrophobicity scales are implemented separately. The secondary structures are derived from the JPred server. With this algorithm, we obtain more than 85% accuracy with higher true pos-itive rate. The results obtained could shed light on many other scientific researches and facilitate structure-based drug discovery with further therapeutic opportunities for many diseases.