Multi-class Classification of Voice Disorders Using Deep Transfer Learning


Rahman M. U., DİREKOĞLU C.

2nd International Conference on Computing, IoT and Data Analytics, ICCIDA 2023, Ciudad Real, İspanya, 20 - 21 Temmuz 2023, cilt.1145 SCI, ss.262-270, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 1145 SCI
  • Doi Numarası: 10.1007/978-3-031-53717-2_25
  • Basıldığı Şehir: Ciudad Real
  • Basıldığı Ülke: İspanya
  • Sayfa Sayıları: ss.262-270
  • Anahtar Kelimeler: Multi-class classification, OpenL3, Voice disorder
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Voice disorders are a widespread issue affecting people of all ages, and accurate diagnosis is crucial for effective treatment. With the recent development of artificial intelligence-based audio and speech processing, research on detection and classification of voice disorders has increased. However, existing work has mostly focused on the binary (two class) classification of voice disorders. Some researchers have also explored multi-class classification, but their results are not promising. In this paper, a framework is proposed for the multi-class classification of voice disorders using OpenL3 embeddings. A pre-trained OpenL3 model is utilized to extract high-level embedding features from the mel spectrogram. Then different classifiers are evaluated after the neighbourhood component analysis (NCA) based feature selection. Random Forest (RF), Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) are employed separately to classify the selected features. The evaluation and comparison are performed on a balanced subset of the Saarbruecken voice database (SVD). Without any speech enhancement preprocessing, our best model, OpenL3-KNN improves the existing work accuracy by 4.9% and F1 score by 8.7%.