A Novel Approach to Speaker Weight Estimation Using a Fusion of the i-vector and NFA Frameworks

سال انتشار: 1394
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 309

فایل این مقاله در 9 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_JESS-3-1_006

تاریخ نمایه سازی: 19 شهریور 1396

چکیده مقاله:

This paper proposes a novel approach for automatic speaker weight estimation from spontaneous telephone speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean super vectors, and the Non-negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weight super vectors. Then, the available information in both Gaussian means and Gaussian weights is exploited through a feature-level fusion of the i-vectors and the NFA vectors. Finally, a least-squares support vector regression (LSSVR) is employed to estimate the weight of speakers from the given utterances. The proposed approach is evaluated on spontaneous telephone speech signals of National Institute of Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation (SRE) corpora. To investigate the effectiveness of the proposed approach, this method is compared to the i-vector-based speaker weight estimation and an alternative fusion scheme, namely the score-level fusion. Experimental results over 2339 utterances show that the correlation coefficients between the actual and the estimated weights of female and male speakers are 0.49 and 0.56, respectively, which indicate the effectiveness of the proposed method in speaker weight estimation.

کلیدواژه ها:

I-vector ، least-squares support vector regression ، non-negative factor analysis ، speaker weight estimation

نویسندگان

Amir Hossein Poorjam

Audio Analysis Lab, AD:MT, Aalborg University, Denmark.

Mohamad Hasan Bahari

the Center for Processing Speech and Images, KU Leuven, Belgium.

Hugo Van hamme

the Center for Processing Speech and Images, KU Leuven, Belgium.