顾宏
开通时间:..
最后更新时间:..
点击次数:
论文类型:会议论文
发表时间:2015-01-12
收录刊物:EI、Scopus
页面范围:189-194
摘要:With the arrival of the era of big data, processing large volumes of data at much faster rates has become more urgent and attracted more and more attentions. Furthermore, many real-world data applications present severe class distribution skews and the underrepresented classes are usually of concern to researchers. Variants of boosting algorithm have been developed to cope with the class imbalance problem. However, due to the inherent sequential nature of boosting, these methods can not be directly applied to efficiently handle largescale data. In this paper, we propose a new parallelized version of boosting, AdaBoost. Balance, to deal with the imbalanced big data. It adopts a new balanced sampling method which combines undersampling methods with oversampling methods and can be simultaneously calculated by multiple computing nodes to construct a final ensemble classifier. Consequently, it is easily implemented by the parallel processing platform of big data such as the MapReduce framework.