Hits:
Indexed by:会议论文
Date of Publication:2017-01-01
Included Journals:EI、CPCI-S、Scopus
Page Number:10777-10782
Key Words:Incomplete dataset; correlation; KNN; FCM
Abstract:When clustering incomplete datasets, data on cluster border (border data) are more likely to be misclassified. Aiming at this problem, the proposed algorithm focuses on the re-classification of "suspected misclassified" border data (abbreviated as SM border data). Based on the preliminary clustering results of classical FCM-based algorithm for incomplete data and the KNN (k nearest neighbor) principle, a simple SM border data detection method is given. And then it is proposed to use correlation of attributes as new similarity measure to perform re-classification on SM border data. Thus, by increasing the clustering accuracy of SM border data, the clustering performance of incomplete datasets can be improved. In experiments on artificial dataset which fits shifting-scaling model and two real datasets, we show our algorithm outperforms the classical FCM-based algorithms for incomplete data. And the experimental results indicate that our method can be applied to complete dataset as well.