Hits:
Indexed by:会议论文
Date of Publication:2013-10-06
Included Journals:EI、CPCI-S、Scopus
Page Number:58-62
Key Words:protein subcellular localizaiton; big data; multiplex protein; active learning; transductive learning
Abstract:Protein subcellular localization prediction based on machine learning is a research focus in bioinformatics. The fast growth of protein sequences in databases leads to be hard to label enough protein samples only by experts for training a learner to get satisfying prediction result. This paper proposes a novel integrated method for human multiplex protein subcellular localization prediction. In this method, to avoid artificially evaluating and labeling the big data of unseen proteins, an active sample selection algorithm is presented to pick out protein samples with non-experimental labels as supplementary training data to help train an ensemble predictor, which includes a protein identifying module, a single-label classifier and a multi-label classifier. The numerical experiments show the effectiveness of the proposed approach.