Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language

Saman Namdar; Hesham Faili; Shahram Khadivi

Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language

محل انتشار: مجله بین المللی ارتباطات و فناوری اطلاعات، دوره: 5، شماره: 1

سال انتشار: 1391

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 173

فایل این مقاله در 14 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/1425805

شناسه ملی سند علمی:

JR_ITRC-5-1_005

تاریخ نمایه سازی: 22 فروردین 1401

چکیده مقاله:

Statistical Machine Translation (SMT) is a machine translation paradigm, in which translations are generated on the base of statistical models. In this system, parameters are derived from an analysis of a parallel corpus, and SMT quality depends on the ability of learning word translations. Enriching the SMT by a suitable morphology analyser decreases out of vocabulary words and dictionary size dramatically. This could be more considerable when it deals with a highly-inflectional, low-resource, language like Persian. Defining a suitable granularity for word segment may improve the alignment quality in the parallel corpus. In this paper different schemes and word’s combinations segments in a SMT’s experiment from Persian to English language are prospected and the best one-to-one alignment, which is called En-like scheme, is proposed. By using the mentioned scheme the translation’s quality from Persian to English is improved about ۳ points with respect to BLEU measure over the phrase-based SMT.

کلیدواژه ها:

Statistical Machine Translation ، Segmentation Schemes ، Lexical Granularities ، Morpheme ، Persian Language

نویسندگان

Saman Namdar

Hesham Faili

Shahram Khadivi