![]() |
个人信息Personal Information
教授
博士生导师
硕士生导师
性别:男
毕业院校:东北大学
学位:博士
所在单位:控制科学与工程学院
学科:应用数学. 应用数学. 控制理论与控制工程
办公地点:创新园大厦A0620
联系方式:电话: (+86-411) 84726020 (home) (+86-411) 84709380 (Office) 传真: (+86-411) 84707579 手机: (+86-411) 13130042458
电子邮箱:xdliuros@dlut.edu.cn
A fast rank mutual information based decision tree and its implementation via Map-Reduce
点击次数:
论文类型:期刊论文
发表时间:2018-05-25
发表刊物:CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
收录刊物:SCIE、EI
卷号:30
期号:10
ISSN号:1532-0626
关键字:classification; decision trees; dominance rough sets; Map-Reduce; parallel computing
摘要:To address the time-consuming problem for the confirmation of splitting attributes and splitting points in classic rank mutual information based decision trees, this paper establishes a fast rank mutual information based decision tree (FRMIDT) for classification problems. First, the proposed FRMIDT algorithm improves the velocity by a max-relevance and min-redundancy criterion to remove the redundant attributes in each tree node building. Then, the fuzzy c-means algorithm is employed to confirm the splitting points for further acceleration. Meanwhile, a parallel implementation is developed in the framework of Map-Reduce (MR-FRMIDT) for medium or large-scale data classification. Several comparative studies are conducted on UCI benchmark data sets. In contrast to the classic rank mutual information based decision tree on 12 data sets, the proposed FRMIDT model effectively reduces the computational time on the premise of keeping testing accuracy. Furthermore, the proposed FRMIDT algorithm is comparable through comparing FRMIDT with other traditional decision tree classifiers including BFT, C4.5, LAD, NBT, and SC. Meanwhile, the comparison with 7 different popular splitting measures based monotonic decision trees on several data sets illustrates the effectiveness of FRMIDT in monotonic classification. At last, the experimental analysis on other 6 data sets shows that the proposed MR-FRMIDT is feasible and has a good parallel performance on reducing execution time and avoiding memory restrictions.