周惠巍

个人信息Personal Information

副教授

博士生导师

硕士生导师

性别:女

毕业院校:大连理工大学

学位:博士

所在单位:计算机科学与技术学院

学科:人工智能

办公地点:大连理工大学创新园大厦B911

电子邮箱:zhouhuiwei@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes

点击次数:

论文类型:期刊论文

发表时间:2020-01-30

发表刊物:BMC BIOINFORMATICS

收录刊物:PubMed、SCIE

卷号:21

期号:1

页面范围:35

ISSN号:1471-2105

关键字:Entity recognition; Entity normalization; Knowledge base; Attention mechanism; Contextual word representations

摘要:Background Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant could denote several different entity identifiers. Results To remedy the above issues, we present a novel knowledge-enhanced system for protein/gene named entity recognition (PNER) and normalization (PNEN). On one hand, a large amount of entity name knowledge extracted from biomedical knowledge bases is used to recognize more entity variants. On the other hand, structural knowledge of entities is extracted and encoded as identifier (ID) embeddings, which are then used for better entity normalization. Moreover, deep contextualized word representations generated by pre-trained language models are also incorporated into our knowledge-enhanced system for modeling multi-sense information of entities. Experimental results on the BioCreative VI Bio-ID corpus show that our proposed knowledge-enhanced system achieves 0.871 F1-score for PNER and 0.445 F1-score for PNEN, respectively, leading to a new state-of-the-art performance. Conclusions We propose a knowledge-enhanced system that combines both entity knowledge and deep contextualized word representations. Comparison results show that entity knowledge is beneficial to the PNER and PNEN task and can be well combined with contextualized information in our system for further improvement.