The Use of Informed Priors in Biclustering of Gene Expression with the Hierarchical Dirichlet Process


Tercan B., Acar A. C.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, cilt.17, sa.5, ss.1810-1821, 2020 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 17 Sayı: 5
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1109/tcbb.2019.2901676
  • Dergi Adı: IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Agricultural & Environmental Science Database, BIOSIS, Biotechnology Research Abstracts, Communication Abstracts, Compendex, EMBASE, INSPEC, MEDLINE, Metadex, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.1810-1821
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

We motivate and describe the application of Hierarchical Dirichlet Process (HDP) models to the "soft" biclustering of gene expression data, in which we obtain modules (biclusters) where the affiliation of genes and samples with the modules are weighted, instead of being hard memberships. As a distinct contribution, we propose a method which HDP is informed with prior beliefs, significantly increasing the quality of the biclustering in terms of both the correctness of the number of modules inferred, and the precision of these modules, especially when evidence is sparse. We outline two such informed priors; one based on co-expression relationships inherent in the data, the other based on an externally provided regulatory network. We validate these results and compare the performance of our approach to Weighted Gene Correlation Network Analysis (WGCNA), another model that features weighted modules. We have, to this end, performed experiments on semi-synthetic data. The results show that HDP, with the addition of a well-informed prior, is able to capture the correct number of modules with increased accuracy. Furthermore, the model becomes robust to changes in the strength of the prior. We conclude by discussing these results and the benefits provided by our approach for gene expression analysis and network validation.