Maximum triangle rule and semi-supervision based k-means algorithm for large scale data
Release time:2019-03-11
Hits:
Indexed by:
期刊论文
First Author:
Feng J.
Co-author:
Lu Z.,Zhang Z.
Date of Publication:
2015-01-01
Journal:
ICIC Express Letters
Included Journals:
EI、Scopus
Document Type:
J
Volume:
9
Issue:
6
Page Number:
1553-1558
ISSN No.:
1881803X
Abstract:
The clustering algorithms which need to repeatedly scan the whole data set can not well complete the clustering analysis of large scale data sets. At the same time, affected by the initialization parameter and data distribution, the quality of clustering results obtained by some of them is not high. In order to solve these problems, a Maximum Triangle Rule and Semi-Supervision based k-means algorithm (MTRSSKM) is designed in this paper. MTRSSKM applies the maximum triangle rule to choose the initial clustering centers for the k-means clustering algorithm, and uses a small amount of labels retained in the memory to supervise and guide the clustering process. MTRSSKM only needs to scan the original data set one time. The clustering quality of the MTRSSKM is improved and the idea of one scan accelerates the clustering process of MTRSSKM. The experiment on the 1998KDD data set shows the effectiveness of MTRSSKM. ? 2015 ICIC International.
Translation or Not:
no