Automated Audio Captioning With Topic Modeling


Creative Commons License

Eren A. O., Sert M.

IEEE ACCESS, cilt.11, ss.4983-4991, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 11
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1109/access.2023.3235733
  • Dergi Adı: IEEE ACCESS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.4983-4991
  • Anahtar Kelimeler: Semantics, Bit error rate, Feature extraction, Transformers, Event detection, Audio systems, Predictive models, Audio captioning, audio event detection, PANNs, topic modeling, BERTopic
  • Orta Doğu Teknik Üniversitesi Adresli: Hayır

Özet

Automatic audio captioning (AAC) is an important area of research aimed at generating meaningful descriptions for audio clips. Most existing methods use relevant semantic information to improve AAC performance and have demonstrated the feasibility of semantic information extraction. Audio events and keywords are commonly used for this purpose. Unlike previous studies, this study proposes a framework that uses topic modeling to obtain relevant semantic content since topic models explore the main themes of the documents. To this end, we present a framework that integrates audio embeddings with audio topics in a transformer-based encoder-decoder architecture. First, we represent each audio clip with a set of topics using a pre-trained topic model, BERTopic. Then, we design a multilayer perceptron (MLP)-based multi-label classifier to predict the topics of audio clips in the testing phase. Finally, in the proposed framework, we input audio embedding and extracted topics into the transformer model to generate captions. The results show that the proposed model improves performance and competes with the most advanced methods that utilize additional external data for training. We believe that the topic modeling can be used to extract semantic content in the AAC task.