杨志豪

个人信息Personal Information

教授

博士生导师

硕士生导师

性别:男

毕业院校:大连理工大学

学位:博士

所在单位:计算机科学与技术学院

电子邮箱:yangzh@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

Exploiting the contextual cues for bio-entity name recognition in biomedical literature

点击次数:

论文类型:期刊论文

发表时间:2008-08-01

发表刊物:JOURNAL OF BIOMEDICAL INFORMATICS

收录刊物:SCIE、EI、PubMed、Scopus

卷号:41

期号:4

页面范围:580-587

ISSN号:1532-0464

关键字:text mining; information extraction; named entity recognition; conditional random fields; contextual cue

摘要:To extract biomedical information about bio-entities from the huge amount of biomedical literature, the first key step is recognizing their names in these literatures, which remains a challenging task due to the irregularities and ambiguities in bio-entities nomenclature. The recognition performances of the current popular methods, machine learning techniques, still have much space to be improved. This paper presents a Conditional Random Field-based approach used to recognize the names of bio-entities including gene, protein, cell type, cell line and studies the methods of improving the performance by the exploitation of the contextual cues including bracket pair, heuristic syntax structure and interaction words cue. Experiment results on both JNLPBA2004 and BioCreative2004 task 1A datasets show that these methods can improve Conditional Random Field-based recognition performance by more than 2 points in F-score. (C) 2008 Elsevier Inc. All rights reserved.