大连理工大学主页平台管理系统杨鑫--中文主页-- Optimized big data K-means clustering using MapReduce

杨鑫

教授博士生导师硕士生导师
主要任职：学科建设办公室副主任
性别：男
毕业院校：浙江大学
学位：博士
所在单位：计算机科学与技术学院
学科：计算机应用技术. 计算机软件与理论. 计算机系统结构
办公地点：创新园大厦B1015
电子邮箱：xinyang@dlut.edu.cn

访问量：

开通时间：..

最后更新时间：..

当前位置: 中文主页 >> 科学研究 >> 论文成果

Optimized big data K-means clustering using MapReduce

点击次数：

论文类型：期刊论文

发表时间：2014-12-01

发表刊物：JOURNAL OF SUPERCOMPUTING

收录刊物：SCIE、EI

卷号：70

期号：3

页面范围：1249-1259

ISSN号：0920-8542

关键字：K-means; MapReduce; Sampling; Performance

摘要：Clustering analysis is one of the most commonly used data processing algorithms. Over half a century, K-means remains the most popular clustering algorithm because of its simplicity. Recently, as data volume continues to rise, some researchers turn to MapReduce to get high performance. However, MapReduce is unsuitable for iterated algorithms owing to repeated times of restarting jobs, big data reading and shuffling. In this paper, we address the problems of processing large-scale data using K-means clustering algorithm and propose a novel processing model in MapReduce to eliminate the iteration dependence and obtain high performance. We analyze and implement our idea. Extensive experiments on our cluster demonstrate that our proposed methods are efficient, robust and scalable.

下一条：Complex shading efficiently for ray tracing on GPU