Acoustic Scene Classification with Modulation Spectrogram Features and a Convolutional Recurrent Network

Sayeh, Mirzaei; Saeedeh, Davoudi

Acoustic Scene Classification with Modulation Spectrogram Features and a Convolutional Recurrent Network

عنوان مقاله: Acoustic Scene Classification with Modulation Spectrogram Features and a Convolutional Recurrent Network
شناسه ملی مقاله: ISAV11_037
منتشر شده در یازدهمین کنفرانس بین المللی آکوستیک و ارتعاشات در سال 1400

مشخصات نویسندگان مقاله:

Sayeh Mirzaei - School of Engineering Science, College of Engineering, University of Tehran, Tehran, Iran.
Saeedeh Davoudi - School of Engineering Science, College of Engineering, University of Tehran, Tehran, Iran

خلاصه مقاله:

One of the major objectives of artificial intelligent systems is making the machine aware of the environment. Acoustic scene classification (ASC) aims to detect the auditory scene of the recorded sound. In this paper, we propose a novel feature extraction approach based on evaluating the modulation spectrogram features instead of the commonly used Mel spectrogram. Modulation spectrogram provides more discriminant features for classification. We split the recording into several temporal segments and compute the modulation spectrogram for each segment individually. The obtained feature tensors then construct the input data of a Convolutional Long Short Term Memory (Conv-LSTM) model for classification. Using LSTM, we can capture constructive temporal information used for classification. The spectral structure of the audio signal is effectively extracted by convolutional layers. The proposed model outperforms the state of the art methods in terms of the prediction accuracy for evaluation data in ASC on the DCASE ۲۰۱۷ dataset.

کلمات کلیدی:

Acoustic scene classification, Convolutional Neural Network (CNN), Long Short Term Memory (LSTM), Conv-LSTM, modulation spectrogram.

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1395213/