Secondary data analysis using Evidence-Based Bayesian Networks with an application to investigate the determinants of childhood stunting


YET B., Öykü Başerdem E., Rosenstock T.

Expert Systems with Applications, cilt.256, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 256
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1016/j.eswa.2024.124940
  • Dergi Adı: Expert Systems with Applications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Anahtar Kelimeler: Abstraction, Bayesian Networks, Childhood Stunting, Secondary Data
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Secondary data – data previously collected by other researchers for a different purpose – offers a cost-effective and readily available resource for research and policy or program design but presents challenges due to the lack of control of sampling design or data. Bayesian Networks (BN) are well-suited for guiding secondary data analysis as their graphical structure can encode domain knowledge about the causal relationships among factors, and secondary data can be used to learn the nature and strength of these relationships. In order to build BNs from a combination of knowledge and secondary data, the causal structure is firstly built based on expert knowledge and published evidence, and then the parameters are learned from the data. However, the variables in secondary data often imperfectly match the variables in the causal BN structure. When ad-hoc structural modifications are made to match the structure and data, the link between the parameterized model and the supporting knowledge and evidence is lost. This paper presents a systematic method of building BNs based on secondary data. We build the BN structure based on published evidence and expert interviews, carefully documenting the origin of evidence for each relation in the BN. We use formal BN abstraction operations to match the expert structure with the secondary data. The causal and associational implications of applying abstraction operations are traced, making it possible to link the original BN with the parameterized model and trace it back to more complicated models when additional data become available. The method is demonstrated by building a BN model for the drivers of childhood stunting. The BN model puts together the rich published evidence in this domain in a BN structure and evidence-base while learning the parameters of this model from the Demographic and Health Survey (DHS) datasets for India and Senegal. We compared the BNs built by our approach to BNs learned purely from secondary data using structure learning algorithms. We found that none of the learning algorithms can lead to structures close to the evidence-based model. Yet, the link between our models and the evidence is clearly established due to abstraction approaches. The stunting case study demonstrates the advantages of having a clear evidence-base and building a formal link between the evidence and secondary data using abstraction. The resulting models and supporting evidence can be browsed in an online tool.