2nd International Conference on Computing, IoT and Data Analytics, ICCIDA 2023, Ciudad Real, Spain, 20 - 21 July 2023, vol.1145 SCI, pp.262-270, (Full Text)
Voice disorders are a widespread issue affecting people of all ages, and accurate diagnosis is crucial for effective treatment. With the recent development of artificial intelligence-based audio and speech processing, research on detection and classification of voice disorders has increased. However, existing work has mostly focused on the binary (two class) classification of voice disorders. Some researchers have also explored multi-class classification, but their results are not promising. In this paper, a framework is proposed for the multi-class classification of voice disorders using OpenL3 embeddings. A pre-trained OpenL3 model is utilized to extract high-level embedding features from the mel spectrogram. Then different classifiers are evaluated after the neighbourhood component analysis (NCA) based feature selection. Random Forest (RF), Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) are employed separately to classify the selected features. The evaluation and comparison are performed on a balanced subset of the Saarbruecken voice database (SVD). Without any speech enhancement preprocessing, our best model, OpenL3-KNN improves the existing work accuracy by 4.9% and F1 score by 8.7%.