Current position: Home >> Scientific Research >> Paper Publications

Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud

Release Time:2019-03-13  Hits:

Indexed by: Journal Article

Date of Publication: 2016-08-01

Journal: JOURNAL OF SUPERCOMPUTING

Included Journals: Scopus、EI、SCIE

Volume: 72

Issue: 8

Page Number: 2977-2990

ISSN: 0920-8542

Key Words: High-dimensional data; Incomplete data imputation; Feature subset selection; Clustering analysis

Abstract: Incomplete data imputation plays an important role in big data analysis and smart computing. Existing algorithms are of low efficiency and effectiveness in imputing incomplete high-dimensional data. The paper proposes an incomplete high-dimensional data imputation algorithm based on feature selection and cluster analysis (IHDIFC), which works in three steps. First, a hierarchical clustering-based feature subset selection algorithm is designed to reduce the dimensions of the data set. Second, a parallel -means algorithm based on partial distance is derived to cluster the selected data subset efficiently. Finally, the data objects in the same cluster with the target are utilized to estimate its missing feature values. Extensive experiments are carried out to compare IHDIFC to two representative missing data imputation algorithms, namely FIMUS and DMI. The results demonstrate that the proposed algorithm achieves better imputation accuracy and takes significantly less time than other algorithms for imputing high-dimensional data.

Prev One:ICFS: An Improved Fast Search and Find of Density Peaks Clustering Algorithm

Next One:Sociat-Oriented Resource Management in Cloud-Based Mobile Networks