![]() |
个人信息Personal Information
教授
博士生导师
硕士生导师
主要任职:teaching
性别:男
毕业院校:重庆大学
学位:博士
所在单位:软件学院、国际信息与软件学院
学科:软件工程. 计算机软件与理论
办公地点:开发区综合楼405
联系方式:Email: zkchen@dlut.edu.cn Moble:13478461921 微信:13478461921 QQ:1062258606
电子邮箱:zkchen@dlut.edu.cn
Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
点击次数:
论文类型:期刊论文
发表时间:2016-08-01
发表刊物:JOURNAL OF SUPERCOMPUTING
收录刊物:SCIE、EI、Scopus
卷号:72
期号:8
页面范围:2977-2990
ISSN号:0920-8542
关键字:High-dimensional data; Incomplete data imputation; Feature subset selection; Clustering analysis
摘要:Incomplete data imputation plays an important role in big data analysis and smart computing. Existing algorithms are of low efficiency and effectiveness in imputing incomplete high-dimensional data. The paper proposes an incomplete high-dimensional data imputation algorithm based on feature selection and cluster analysis (IHDIFC), which works in three steps. First, a hierarchical clustering-based feature subset selection algorithm is designed to reduce the dimensions of the data set. Second, a parallel -means algorithm based on partial distance is derived to cluster the selected data subset efficiently. Finally, the data objects in the same cluster with the target are utilized to estimate its missing feature values. Extensive experiments are carried out to compare IHDIFC to two representative missing data imputation algorithms, namely FIMUS and DMI. The results demonstrate that the proposed algorithm achieves better imputation accuracy and takes significantly less time than other algorithms for imputing high-dimensional data.