周惠巍

个人信息Personal Information

副教授

博士生导师

硕士生导师

性别:女

毕业院校:大连理工大学

学位:博士

所在单位:计算机科学与技术学院

办公地点:大连理工大学创新园大厦B911

电子邮箱:zhouhuiwei@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

A Multistage Gene Normalization System Integrating Multiple Effective Methods

点击次数:

论文类型:期刊论文

发表时间:2013-12-12

发表刊物:PLOS ONE

收录刊物:SCIE、PubMed、Scopus

卷号:8

期号:12

页面范围:e81956

ISSN号:1932-6203

摘要:Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems.