Hits:
Indexed by:会议论文
Date of Publication:2015-10-29
Included Journals:EI、CPCI-S、SCIE、Scopus
Page Number:1539-1544
Key Words:Microarray Data; MapReduce Programming Model; Parallel Information Fusion
Abstract:Classification of microarray data has always been a challenging task due to the enormous number of genes. Finding a small, closely related gene set to accurately classify disease cells is an important research problem. Integrating biological knowledge into genomic analysis to help to improve the interpretation of the results is an effective approach. In this paper, affinity propagation (AP) clustering algorithm is chosen to analyze the impact of the biological similarity on the results. We integrate GO semantic similarity into AP clustering for granule construction. Using MapReduce programming model, a parallel information fusion method is proposed. The process of similarity matrix construction and message passing in AP algorithm is parallelized using MapReduce. Parallel randomly directed hill climb ensemble pruning (RandomDHCEP) method based on MapReduce is introduced for ensemble pruning. An instance analysis represents the process of affinity propagation and ensemble pruning by using iterative MapReduce program. The proposed method can offer good scalability on large data with increasing number of nodes and it can also provide higher classification accuracy rather than using whole gene set for classification.