Evaluating the quality of visual explanations on chest X-ray images for thorax diseases classification

Rahimiaghdam, Shakiba; ALEMDAR, HANDE

doi:10.1007/s00521-024-09587-0

Evaluating the quality of visual explanations on chest X-ray images for thorax diseases classification

Atıf İçin Kopyala

Rahimiaghdam S., ALEMDAR H.

Neural Computing and Applications, cilt.36, sa.17, ss.10239-10255, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 36 Sayı: 17
Basım Tarihi: 2024
Doi Numarası: 10.1007/s00521-024-09587-0
Dergi Adı: Neural Computing and Applications
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, Index Islamicus, INSPEC, zbMATH
Sayfa Sayıları: ss.10239-10255
Anahtar Kelimeler: Deep neural networks, Explainable artificial intelligence, Machine learning, Medical image classification
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Deep learning models are extensively used but often lack transparency due to their complex internal mechanics. To bridge this gap, the field of explainable AI (XAI) strives to make these models more interpretable. However, a significant obstacle in XAI is the absence of quantifiable metrics for evaluating explanation quality. Existing techniques, reliant on manual assessment or inadequate metrics, face limitations in scalability, reproducibility, and trustworthiness. Recognizing these issues, the current study specifically addresses the quality assessment of visual explanations in medical imaging, where interpretability profoundly influences diagnostic accuracy and trust in AI-assisted decisions. Introducing novel criteria such as informativeness, localization, coverage, multi-target capturing, and proportionality, this work presents a comprehensive method for the objective assessment of various explainability algorithms. These newly introduced criteria aid in identifying optimal evaluation metrics. The study expands the domain’s analytical toolkit by examining existing metrics, which have been prevalent in recent works for similar applications, and proposing new ones. Rigorous analysis led to selecting Jensen–Shannon divergence (JS_DIV) as the most effective metric for visual explanation quality. Applied to the multi-label, multi-class diagnosis of thoracic diseases using a trained classifier on the CheXpert dataset, local interpretable model-agnostic explanations (LIME) with diverse segmentation strategies interpret the classifier’s decisions. A qualitative analysis on an unseen subset of the VinDr-CXR dataset evaluates these metrics, confirming JS_DIV’s superiority. The subsequent quantitative analysis optimizes LIME’s hyper-parameters and benchmarks its performance across various segmentation algorithms, underscoring the utility of an objective assessment metric in practical applications.