大连理工大学  登录  English 
张宪超
点赞:

教授   博士生导师   硕士生导师

性别: 男

毕业院校: 中国科技大学

学位: 博士

所在单位: 软件学院、国际信息与软件学院

学科: 计算机应用技术. 软件工程

电子邮箱: xczhang@dlut.edu.cn

手机版

访问量:

开通时间: ..

最后更新时间: ..

当前位置: 中文主页 >> 科学研究 >> 论文成果
Constraint based dimension correlation and distance divergence for clustering high-dimensional data

点击次数:

论文类型: 会议论文

发表时间: 2010-12-14

收录刊物: EI、Scopus

页面范围: 629-638

摘要: Clusters are hidden in subspaces of high dimensional data, i.e., only a subset of features is relevant for each cluster. Subspace clustering is challenging since the search for the relevant features of each cluster and the detection of the final clusters are circular dependent and should be solved simultaneously. In this paper, we point out that feature correlation and distance divergence are important to subspace clustering, but both have not been considered in previous works. Feature correlation groups correlated features independently thus helps to reduce the search space for the relevant features search problem. Distance divergence distinguishes distances on different dimensions and helps to find the final clusters accurately. We tackle the two problems with the aid of a small amount domain knowledge in the form of must-links and cannot-links. We then devise a semi-supervised subspace clustering algorithm CDCDD. CDCDD integrates our solutions of the feature correlation and distance divergence problems, and uses an adaptive dimension voting scheme, which is derived from a previous unsupervised subspace clustering algorithm FINDIT. Experimental results on both synthetic data sets and real data sets show that the proposed CDCDD algorithm outperforms FINDIT in terms of accuracy, and outperforms the other constraint based algorithm SCMINER in terms of both accuracy and efficiency. ? 2010 IEEE.

辽ICP备05001357号 地址:中国·辽宁省大连市甘井子区凌工路2号 邮编:116024
版权所有:大连理工大学