王健

个人信息Personal Information

教授

博士生导师

硕士生导师

性别:女

毕业院校:大连理工大学

学位:博士

所在单位:计算机科学与技术学院

学科:计算机应用技术

办公地点:创新园大厦B811

联系方式:0411-84706009-2811

电子邮箱:wangjian@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

CIDExtractor: a chemical-induced disease relation extraction system for biomedical literature

点击次数:

论文类型:会议论文

发表时间:2016-01-01

收录刊物:CPCI-S

页面范围:994-1001

关键字:Information extraction; Semi-supervised learning; Co-Training

摘要:Adverse drug reactions between chemicals and diseases make chemical-disease relations (CDR) become a research focus. In this paper, we present a chemical-induced disease (Cm) relation extraction system, CIDExtractor, to extract cm relations from biomedical literature. CmExtractor first employs a sentence-level classifier to extract the CID relations located in the same sentence. To construct the classifier, a sentence-level training set is manually annotated and then Co-Training algorithm is used to exploit the unlabeled data with the feature kernel and graph kernel as two independent views. Then CIDExtractor uses a document-level classifier to extract the CID relations spanning multiple sentences. The classifier utilizes the document level information (features) of the chemical and disease pair. Finally, some post-processing rules are applied to the union set of two classifiers and generate the final outputs. Experimental results on the test set of BioCreative V CDR CID subtask show that CmExtractor can achieve better performance (an F-score of 67.72%) than the state-of-the-art methods. The online CIDExtractor demonstration system is available at http://202.118.75.18:8888/cdr-dut-ir/cid.html.