个人信息Personal Information
教授
博士生导师
硕士生导师
性别:女
毕业院校:大连理工大学
学位:博士
所在单位:计算机科学与技术学院
学科:计算机应用技术. 计算机软件与理论
办公地点:创新大厦A930
电子邮箱:lils@dlut.edu.cn
Boosting performance of gene mention tagging system by hybrid methods
点击次数:
论文类型:期刊论文
发表时间:2012-02-01
发表刊物:JOURNAL OF BIOMEDICAL INFORMATICS
收录刊物:PubMed、SCIE、EI、Scopus
卷号:45
期号:1
页面范围:156-164
ISSN号:1532-0464
关键字:Hybrid methods; Gene mention tagging; Named entity recognition; Bioinformatics; Biomedical literature
摘要:NER (Named Entity Recognition) in biomedical literature is presently one of the internationally concerned NLP (Natural Language Processing) research questions. In order to get higher performance, a hybrid experimental framework is presented for the gene mention tagging task. Six classifiers are firstly constructed by four toolkits (CRF++, YamCha, Maximum Entropy (ME) and MALLET) with different training methods and features sets, and then combined with three different hybrid methods respectively: simple set operation method, voting method and two layer stacking method. Experiments carried out on the corpus of BioCreative II GM task show that the three hybrid methods get the F-measure of 87.40%, 87.31% and 87.70% separately without any post-processing, which are all higher than those of any single ones. Our best hybrid method (two layer stacking method) achieves an F-measure of 88.42% after post-processing, which outperforms most of the state-of-the-art systems. We also discuss the influence on the performance of the ensemble system by the number, performance and divergence of single classifiers in each hybrid method, and give the corresponding analysis why our hybrid models can improve the performance. (C) 2011 Elsevier Inc. All rights reserved.