SPEECH DETECTION ON BROADCAST AUDIO


Zubari U., Ozan E. C. , Acar B. O. , ÇİLOĞLU T. , Esen E., Ates T. K. , ...Daha Fazla

18th European Signal Processing Conference (EUSIPCO), Aalborg, Danimarka, 23 - 27 Ağustos 2010, ss.85-89 identifier

  • Basıldığı Şehir: Aalborg
  • Basıldığı Ülke: Danimarka
  • Sayfa Sayıları: ss.85-89

Özet

Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-speech via Gaussian Mixture Model (GMM) based classification. GMM's are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC's).