Hits:
Indexed by:期刊论文
Date of Publication:2013-01-01
Journal:NEUROCOMPUTING
Included Journals:SCIE、EI、Scopus
Volume:99
Page Number:124-133
ISSN No.:0925-2312
Key Words:Semi-supervised; Elite pairwise constraints; Clustering
Abstract:Semi-supervised clustering under pairwise constraints (i.e. must-links and cannot-links) has been a hot topic in the data mining community in recent years. Since pairwise constraints provided by distinct domain experts may conflict with each other, a lot of research work has been conducted to evaluate the effects of noise imposing on semi-supervised clustering. In this paper, we introduce elite pairwise constraints, including elite must-link (EML) and elite cannot-link (ECL) constraints. In contrast to traditional constraints, both EML and ECL constraints are required to be satisfied in every optimal partition (i.e. a partition with the minimum criterion function). Therefore, no conflict will be caused by those new constraints. First, we prove that it is NP-hard to obtain EML or ECL constraints. Then, a heuristic method named Limit Crossing is proposed to achieve a fraction of those new constraints. In practice, this new method can always retrieve a lot of EML or ECL constraints. To evaluate the effectiveness of Limit Crossing, multi-partition based and distance based methods are also proposed in this paper to generate faux elite pairwise constraints. Extensive experiments have been conducted on both UCI and synthetic data sets using a semi-supervised clustering algorithm named COP-KMedoids. Experimental results demonstrate that COP-KMedoids under EML and ECL constraints generated by Limit Crossing can outperform those under either faux constraints or no constraints. (C) 2012 Elsevier B.V. All rights reserved.