CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Farsi Conceptual Text Summarizer: A New Model in Continuous vector Space

عنوان مقاله: Farsi Conceptual Text Summarizer: A New Model in Continuous vector Space
شناسه ملی مقاله: JR_JIST-7-1_002
منتشر شده در شماره 1 دوره 7 فصل در سال 1398
مشخصات نویسندگان مقاله:

Mohammad Ebrahim Khademi - Faculty of Electrical and Computer Engineering, Malek Ashtar University of Technology, Iran
Mohammad Fakhredanesh - Faculty of Electrical and Computer Engineering, Malek Ashtar University of Technology, Iran
Seyed Mojtaba Hoseini - Faculty of Electrical and Computer Engineering, Malek Ashtar University of Technology, Iran

خلاصه مقاله:
Traditional methods of summarization were very costly and time-consuming. This led to the emergence of automatic methods for text summarization. Extractive summarization is an automatic method for generating summary by identifying the most important sentences of a text. In this paper, two innovative approaches are presented for summarizing the Farsi texts. In these methods, using a combination of deep learning and statistical methods (TFIDF), we cluster the concepts of the text and, based on the importance of the concepts in each sentence, we derive the sentences that have the most conceptual burden. In these methods, we have attempted to address the weaknesses of representation in repetition-based statistical methods by exploiting the unsupervised extraction of association between vocabulary through deep learning. In the first unsupervised method, without using any hand-crafted features, we achieved state-of-the-art results on the Pasokh single-document corpus as compared to the best supervised Farsi methods. In order to have a better understanding of the results, we have evaluated the human summaries generated by the contributing authors of the Pasokh corpus as a measure of the success rate of the proposed methods. In terms of recall, these have achieved favorable results. In the second method, by giving the coefficient of title effect and its increase, the average ROUGE-2 values increased to 0.4% on the Pasokh single-document corpus compared to the first method and the average ROUGE-1 values increased to 3% on the Khabir news corpus

کلمات کلیدی:
Extractive Text Summarization; Unsupervised Learning; Language Independent Summarization; Continuous Vector Space; Word Embedding

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/993168/