location: Current position: Home >> Scientific Research >> Paper Publications

Balanced sampling method for imbalanced big data using AdaBoost

Hits:

Indexed by:会议论文

Date of Publication:2015-01-12

Included Journals:EI、Scopus

Page Number:189-194

Abstract:With the arrival of the era of big data, processing large volumes of data at much faster rates has become more urgent and attracted more and more attentions. Furthermore, many real-world data applications present severe class distribution skews and the underrepresented classes are usually of concern to researchers. Variants of boosting algorithm have been developed to cope with the class imbalance problem. However, due to the inherent sequential nature of boosting, these methods can not be directly applied to efficiently handle largescale data. In this paper, we propose a new parallelized version of boosting, AdaBoost. Balance, to deal with the imbalanced big data. It adopts a new balanced sampling method which combines undersampling methods with oversampling methods and can be simultaneously calculated by multiple computing nodes to construct a final ensemble classifier. Consequently, it is easily implemented by the parallel processing platform of big data such as the MapReduce framework.

Pre One:基于IPSO-SVM的地铁车辆牵引控制单元故障诊断

Next One:基于子空间的多率控制系统闭环辨识