杨洁

个人信息Personal Information

副教授

博士生导师

硕士生导师

性别:女

毕业院校:大连理工大学

学位:博士

所在单位:数学科学学院

学科:计算数学

办公地点:大连理工大学创新园大厦B1405

联系方式:0411-84708351-8205

电子邮箱:yangjiee@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

Learning imbalanced datasets based on SMOTE and Gaussian distribution

点击次数:

论文类型:期刊论文

发表时间:2020-02-01

发表刊物:INFORMATION SCIENCES

收录刊物:EI、SCIE

卷号:512

页面范围:1214-1233

ISSN号:0020-0255

关键字:Imbalanced; Oversample; Gaussian distribution; SMOTE

摘要:The learning of imbalanced datasets is a ubiquitous challenge for researchers in the fields of data mining and machine learning. Conventional classifiers are often biased towards the majority class, and loss functions attempt to optimize the quantities. In this paper, we present two effective sampling methods that improve the data distributions. One rebalanced method, the Adaptive-SMOTE, improves the SMOTE method by adaptively selecting groups of Inner and Danger data from the minority class such that a new minority class is compiled based on the selected data, thus preventing an expansion of the category boundary and strengthening the distributional characteristics of the original data. The other method, Gaussian Oversampling, combines dimension reduction with the Gaussian distribution, which makes the tail of the Gaussian distribution thinner. Cross-validation experiments on 15 datasets show that the two sampling methods achieve significant improvements compared with other typical methods. The Adaptive-SMOTE has higher F-measure and Acc values than other existing sampling methods and higher robustness to classifiers and datasets with different values of imb. Gaussian Oversampling is more efficient when dealing with extremely imbalanced classifications. (C) 2019 Elsevier Inc. All rights reserved.