Dimension and computation reduction approach for K-Means clustering algorithm for Big Data

This paper proposes a method to reduce the computations of the K-Means clustering algorithm for big data. First, with the PCA algorithm, the dimensions of datasets are reduced to one or two dimensions, and then with using the information of distance from one point to its two nearest centers and their changes in the last two iterations lead to an increase of the speed and quality of the K-Means algorithm.Using real samples and experiments, it was ensured that at the best case the speed of the proposed method was improved by ۹۵.۹۱% and the quality of the proposed method was improved by ۹۹.۷۱%. These findings show that the proposed method is very useful for big data.

کلیدواژه ها:

Improved algorithm ، Modified K-Means ، Clustering ، number of clusters ، initializations ، unsupervised learning ، computational time ، PCA ، clustering time ، clustering Quality.

نویسندگان

Mahdi Yazdian-Dehkordi

Assisitant Professor of Artificial Intelligence Computer Engineering Department, Yazd University yazd, iran

Fatemeh Moodi

Ph.D. student of Computer Engineering Computer Engineering Department, Yazd University yazd, iran

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/1453899

شناسه ملی سند علمی:

DCBDP07_017

تاریخ نمایه سازی: 7 خرداد 1401

نحوه استناد به مقاله:

در صورتی که می خواهید در اثر پژوهشی خود به این مقاله ارجاع دهید، به سادگی می توانید از عبارت زیر در بخش منابع و مراجع استفاده نمایید:

Yazdian-Dehkordi, Mahdi and Moodi, Fatemeh,1401,Dimension and computation reduction approach for K-Means clustering algorithm for Big Data,Seventh National Conference and First International Conference on Distribution Computing and Big Data Processing,Tabriz,https://civilica.com/doc/1453899

در داخل متن نیز هر جا که به عبارت و یا دستاوردی از این مقاله اشاره شود پس از ذکر مطلب، در داخل پارانتز، مشخصات زیر نوشته می شود.
برای بار اول: (1401, Yazdian-Dehkordi, Mahdi؛ Fatemeh Moodi)
برای بار دوم به بعد: (1401, Yazdian-Dehkordi؛ Moodi)
برای آشنایی کامل با نحوه مرجع نویسی لطفا بخش راهنمای سیویلیکا (مرجع دهی) را ملاحظه نمایید.

علم سنجی و رتبه بندی مقاله

مشخصات مرکز تولید کننده این مقاله به صورت زیر است:

رتبه علمی دانشگاه یزد

نوع مرکز: دانشگاه دولتی

تعداد مقالات: 17,626

در بخش علم سنجی پایگاه سیویلیکا می توانید رتبه بندی علمی مراکز دانشگاهی و پژوهشی کشور را بر اساس آمار مقالات نمایه شده مشاهده نمایید.

مقالات مرتبط جدید