دفاعیه ارشد: رفع ابهام معنایی کلمات اختصار و مخفف در زمینه پزشکی با استفاده از الگوریتم های داده کاوی و یادگیری ماشین

Abbreviation and acronym sense disambiguation in clinical domain with data mining and deep learning

رفع ابهام معنایی کلمات اختصار و مخفف در زمینه پزشکی با استفاده از الگوریتم های داده کاوی و یادگیری ماشین روز شنبه، 30 بهمن، 1400 توسط موسسه آموزش عالی زند شیراز در شهر شیراز استان فارس برگزار می شود.

حوزه های تحت پوشش: پزشکی

برگزار کننده: موسسه آموزش عالی زند شیراز

چکیده فارسی به همراه واژگان کلیدی: روش‌های یادگیری عمیق مبتنی بر شبکه‌های عصبی نتایج امیدوارکننده‌ای را در رفع ابهام معنایی کلمات نشان داده‌اند. این مطالعه یک مدل حافظه کوتاه‌مدت بلندمدت دوطرفه را برای رفع ابهام کلمات اختصاری بالینی پیشنهاد می‌کند که مدل برای هر کلمه اختصار آموزش داده می‌شود. برای حل مشکل تعداد کم نمونه و نامتعادل بودن مجموعه‌داده، از تکنیک‌های تولید داده مانند جایگزینی معکوس با استفاده از مجموعه‌داده MEDAL و تقویت داده‌ها با جایگزینی مترادف کلمه با استفاده از تعبیه ساز کلمه از پیش آموزش‌دیده Glove استفاده کردیم. ما از سه تعبیه ساز کلمه استفاده کردیم که دو مورد از آنها با استفاده از Word۲vec و دو روش CBOW و skip-gram آموزش داده شدند، و یک تعبیه ساز کلمه از پیش آموزش‌دیده که بر روی مجموعه‌داده‌های PubMed، PMC و Wikipedia آموزش‌داده‌شده بود. نتایج ارزیابی نشان داد که مدل پیشنهادی با استفاده از تعبیه‌های Wiki-PMC-PubMed بهترین دقت میکرو را به دست آورد. مدل پیشنهادی حافظه کوتاه‌مدت بلندمدت دوطرفه ما به نتایج پیشرفته‌ای بر روی ۱۳ کلمه اختصار انتخابی از مجموعه‌داده‌های UMN بادقت ۹۸.۴۱ درصد دست‌یافت که دقت را تا یک و نیم درصد نسبت به تحقیق انجام شده در سال ۲۰۲۱ توسط جابر و همکاران بر روی همین مجموعه داده بهبود بخشید. واژگان کلیدی: رفع ابهام معنایی، کلمات اختصار بالینی، یادگیری عمیق، تعبیه سازی کلمات Abstract / Key Words: Deep Learning methods based on Neural networks have shown promising results in WSD. This study proposes a Bi-LSTM model for clinical abbreviation's sense disambiguation that was trained for each abbreviation. To solve the insufficient number of samples and an imbalanced dataset, we used data generation techniques such as reverse substitution using MEDAL dataset and data augmentation with synonyms substitution using glove pre-trained word embedding. We used three word embeddings, two of which were trained with Word۲vec using CBOW and skip-gram, and one pre-trained word embedding trained on PubMed, PMC, and Wikipedia datasets as features; the evaluation result showed that The model using the Wiki-PMC-PubMed embeddings achieved the best micro accuracy Our proposed Bi-LSTM model achieved the state-of-the-art results on ۱۳ selected acronyms from the UMN data set with an accuracy of ۹۸.۴۱%, which is up to ۱.۵% higher than the ۲۰۲۱ research conducted by Jaber et al. on the same dataset. Keywords: Word-sense disambiguation, clinical abbreviation, Deep Learning, Word Embedding

نگارنده: ماندا حسینی استاد راهنما: دکتر امیرحسین راسخ استاد مشاور: دکتر امین کشاورزی استاد داور:دکتر بشکاری

Abstract / Key Words: Deep Learning methods based on Neural networks have shown promising results in WSD. This study proposes a Bi-LSTM model for clinical abbreviation's sense disambiguation that was trained for each abbreviation. To solve the insufficient number of samples and an imbalanced dataset, we used data generation techniques such as reverse substitution using MEDAL dataset and data augmentation with synonyms substitution using glove pre-trained word embedding. We used three word embeddings, two of which were trained with Word2vec using CBOW and skip-gram, and one pre-trained word embedding trained on PubMed, PMC, and Wikipedia datasets as features; the evaluation result showed that The model using the Wiki-PMC-PubMed embeddings achieved the best micro accuracy Our proposed Bi-LSTM model achieved the state-of-the-art results on 13 selected acronyms from the UMN data set with an accuracy of 98.41%, which is up to 1.5% higher than the 2021 research conducted by Jaber et al. on the same dataset. Keywords: Word-sense disambiguation, clinical abbreviation, Deep Learning, Word Embedding

درج در سایت: 20 بهمن 1400 - تعداد مشاهده 645 بار