Multilingual Domain Adaptation for Speech Recognition Using LLMs


Ulu E. N., Derya E., Tumer D., Demirel B., Karamanlioglu A.

28th International Conference on Text Speech and Dialogue-TSD-Annual, Erlangen, Germany, 25 - 28 August 2025, vol.16029, pp.381-393, (Full Text) identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 16029
  • Doi Number: 10.1007/978-3-032-02548-7_32
  • City: Erlangen
  • Country: Germany
  • Page Numbers: pp.381-393
  • Middle East Technical University Affiliated: Yes

Abstract

We present a practical pipeline for multilingual domain adaptation in automatic speech recognition (ASR) that combines the Whisper model with large language models (LLMs). Using Aya-23-8B, Common Voice transcripts in 22 languages are automatically classified into the Law and Healthcare domains, producing high-quality domain labels at a fraction of the manual cost. These labels drive parameterefficient (LoRA) fine-tuning of Whisper and deliver consistent relative Word Error Rate (WER) reductions of up to 14.3% for languages that contribute at least 800 in-domain utterances. A data-volume analysis reveals a clear breakpoint: gains become reliably large once that 800-utterance threshold is crossed, while monolingual tuning still rescues performance in truly low-resource settings. The workflow therefore shifts the key success factor from expensive hand labelling to scalable data acquisition, and can be replicated in new domains with minimal human intervention.