顾宏
开通时间:..
最后更新时间:..
点击次数:
论文类型:会议论文
发表时间:2013-10-06
收录刊物:EI、CPCI-S、Scopus
页面范围:58-62
关键字:protein subcellular localizaiton; big data; multiplex protein; active learning; transductive learning
摘要:Protein subcellular localization prediction based on machine learning is a research focus in bioinformatics. The fast growth of protein sequences in databases leads to be hard to label enough protein samples only by experts for training a learner to get satisfying prediction result. This paper proposes a novel integrated method for human multiplex protein subcellular localization prediction. In this method, to avoid artificially evaluating and labeling the big data of unseen proteins, an active sample selection algorithm is presented to pick out protein samples with non-experimental labels as supplementary training data to help train an ensemble predictor, which includes a protein identifying module, a single-label classifier and a multi-label classifier. The numerical experiments show the effectiveness of the proposed approach.