location: Current position: Home >> Scientific Research >> Paper Publications

Optimized big data K-means clustering using MapReduce

Hits:

Indexed by:期刊论文

Date of Publication:2014-12-01

Journal:JOURNAL OF SUPERCOMPUTING

Included Journals:SCIE、EI

Volume:70

Issue:3

Page Number:1249-1259

ISSN No.:0920-8542

Key Words:K-means; MapReduce; Sampling; Performance

Abstract:Clustering analysis is one of the most commonly used data processing algorithms. Over half a century, K-means remains the most popular clustering algorithm because of its simplicity. Recently, as data volume continues to rise, some researchers turn to MapReduce to get high performance. However, MapReduce is unsuitable for iterated algorithms owing to repeated times of restarting jobs, big data reading and shuffling. In this paper, we address the problems of processing large-scale data using K-means clustering algorithm and propose a novel processing model in MapReduce to eliminate the iteration dependence and obtain high performance. We analyze and implement our idea. Extensive experiments on our cluster demonstrate that our proposed methods are efficient, robust and scalable.

Next One:Complex shading efficiently for ray tracing on GPU