金博

个人信息Personal Information

教授

博士生导师

硕士生导师

性别:男

毕业院校:大连理工大学

学位:博士

所在单位:创新创业学院

学科:计算机应用技术

办公地点:创客空间607

电子邮箱:jinbo@dlut.edu.cn

扫描关注

论文成果

当前位置: 金博 >> 科学研究 >> 论文成果

Learning a Distance Metric by Balancing KL-Divergence for Imbalanced Datasets

点击次数:

论文类型:期刊论文

发表时间:2019-12-01

发表刊物:IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

收录刊物:EI、SCIE

卷号:49

期号:12

页面范围:2384-2395

ISSN号:2168-2216

关键字:Distance metric by balancing KL-divergence (DMBK); distance metric learning (DML); geometric mean; imbalanced dataset

摘要:In many real-world domains, datasets with imbalanced class distributions occur frequently, which may confuse various machine learning tasks. Among all these tasks, learning classifiers from imbalanced datasets is an important topic. To perform this task well, it is crucial to train a distance metric which can accurately measure similarities between samples from imbalanced datasets. Unfortunately, existing distance metric methods, such as large margin nearest neighbor, information-theoretic metric learning, etc., care more about distances between samples and fail to take imbalanced class distributions into consideration. Traditional distance metrics have natural tendencies to favor the majority classes, which can more easily satisfy their objective function. Those important minority classes are always neglected during the construction process of distance metrics, which severely affects the decision system of most classifiers. Therefore, how to learn an appropriate distance metric which can deal with imbalanced datasets is of vital importance, but challenging. In order to solve this problem, this paper proposes a novel distance metric learning method named distance metric by balancing KL-divergence (DMBK). DMBK defines normalized divergences using KL-divergence to describe distinctions between different classes. Then it combines geometric mean with normalized divergences and separates samples from different classes simultaneously. This procedure separates all classes in a balanced way and avoids inaccurate similarities incurred by imbalanced class distributions. Various experiments on imbalanced datasets have verified the excellent performance of our novel method.