Analysis of competitive endogenous RNA (ceRNA) network to finddiagnostic biomarkers for gastric cancer using machine learningmethods

سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 77

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

AIMS01_028

تاریخ نمایه سازی: 1 مرداد 1402

چکیده مقاله:

Background and aims: Gastric cancer (GC) is the third cause of cancer-related deaths, worldwide.With the sequestering of shared miRNAs, competitive endogenous (ce) RNAs can regulateone another and influence the development of cancer. The aim of this study was to build a diagnosticmodel for GC using ceRNA network and machine learning methods.Methods: The RNA-seq and clinical data of GC patients were downloaded using TCGAbiolinksR-package, including ۳۳۵ tumor and ۳۰ non-tumor samples. Differentially-expressed longnon-coding RNAs (lncRNAs) (DELs), miRNAs (DEmiRs), and mRNAs (DEMs) between tumorand non-tumor samples were extracted by R-package DESeq۲ based on |Log۲ fold change|>۱ andadjusted p<۰.۰۵. The samples were divided into low-stage (stages I and II) and high-stage (stagesIII and IV) based on their AJCC stage feature and the chi square test was used to determine theassociation between RNA expression and tumor stage. These stage-related genes were then usedto predict the miRNA–mRNA and miRNA–lncRNA interactions utilizing the multiMiR R-packageand DIANA-LncBase v۳.۰, respectively. A lncRNA-miRNA-mRNA ceRNA network wasthen constructed and those lncRNAs which entered the network were used in machine learningsteps. For machine learning, we split the data into training and test with a ratio of ۰.۷ to ۰.۳ andSMOTETomek method was utilized to balance the number of samples in the training cohort. Featureselection was performed using Recursive Feature Elimination (RFE) method and the selectedfeatures were utilized to build a logistic regression model.Results: We identified ۱۹۳ DELs, ۱۵ DEmiRs, and ۲۱۴ DEMs which were stage-related in GCpatients. After extracting the miRNA–mRNA and miRNA–lncRNA pairs, the ceRNA networkwas constructed and all ۱۹ lncRNAs of the network were considered as inputs for machine learningsteps. For model construction, samples were categorized into three groups including tumor/low-stage, tumor/high-stage and non-tumor. After balancing the training cohort, using RFE, fourlncRNAS were selected (ENSG۰۰۰۰۰۱۹۷۰۸۵, ENSG۰۰۰۰۰۲۳۰۰۰۲, ENSG۰۰۰۰۰۲۷۴۹۶۴ andENSG۰۰۰۰۰۲۸۶۲۰۸) as final candidates. A logistic regression model was constructed which itsarea under the curve (AUC) in the test cohort was ۰.۸۶ showing its great ability to separate tumorfrom non-tumor and high-stage from low-stage GC samples.Conclusion: Machine learning techniques can make a huge contribute on the process of earlydiagnosis and prediction of cancer. This study successfully constructed a ceRNA network and introduceda stage-related lncRNA signature utilizing machine learning approaches which reliablysplits GC patients according to their tumor stage.

نویسندگان

Maryam Hosseini

Department of Genetics and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran

Bsireh Bahrami

Department of Genetics and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran

ParvanehNikpour Department of Genetics and Molecular Biology, Facu

Department of Genetics and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran