Distribution Aware Testing Framework for Deep Neural Networks

Demir, Demet; Can, AYSU; Sürer, ELİF

doi:10.1109/access.2023.3327820

Distribution Aware Testing Framework for Deep Neural Networks

Atıf İçin Kopyala

Demir D., Can A. B., Sürer E.

IEEE Access, cilt.11, ss.119481-119505, 2023 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 11
Basım Tarihi: 2023
Doi Numarası: 10.1109/access.2023.3327820
Dergi Adı: IEEE Access
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.119481-119505
Anahtar Kelimeler: Data distribution, deep learning testing, explainability, test selection and prioritization, uncertainty
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

The increasing use of deep learning (DL) in safety-critical applications highlights the critical need for systematic and effective testing to ensure system reliability and quality. In this context, researchers have conducted various DL testing studies to identify weaknesses in Deep Neural Network (DNN) models, including exploring test coverage, generating challenging test inputs, and test selection. In this study, we propose a generic DNN testing framework that takes into consideration the distribution of test data and prioritizes them based on their potential to cause incorrect predictions by the tested DNN model. We evaluated the proposed framework using the image classification as a use case. We conducted empirical evaluations by implementing each phase with carefully chosen methods. We employed Variational Autoencoders to identify and eliminate out-of-distribution data from the test datasets. Additionally, we prioritize test data that increase uncertainty in the model, as these cases are more likely to reveal potential faults. The elimination of out-of-distribution data enables a more focused analysis to uncover the sources of DNN failures while using prioritized test data reduces the cost of test data labeling. Furthermore, we explored the use of post-hoc explainability methods to identify the cause of incorrect predictions, a process similar to debugging. This study can be a prelude to incorporating explainability methods into the model development process after testing.