location: Current position: Home >> Scientific Research >> Paper Publications

Incorporating dictionary features into conditional random fields for gene/protein named entity recognition

Hits:

Indexed by:会议论文

Date of Publication:2007-05-22

Included Journals:EI、CPCI-S

Volume:4819

Page Number:162-173

Key Words:BioNER; dictionary feature; CRF

Abstract:Biomedical Named Entity Recognition (BioNER) is an important preliminary step for biomedical text mining. Previous researchers built dictionaries of gene/protein names from online databases and incorporated them into machine learning models as features, but the effects were very limited. This paper gives a quality assessment of four dictionaries derived form online resources, and investigate the impacts of two factors (i.e., dictionary coverage and noisy terms) that may lead to the poor performance of dictionary features. Experiments are performed by comparing performances of the external dictionaries and a dictionary derived from GENETAG corpus, using Conditional Random Fields (CRFs) with dictionary features. We also make observations of the impacts regarding long names and short names. The results show that low coverage of long names and noises of short names are the main problems of current online resources and a high quality dictionary could substantially improve the accuracy of BioNER.

Pre One:基于贝叶斯模型的词汇情感消歧

Next One:基于问句相似度的中文FAQ问答系统