Hits:
Indexed by:会议论文
Date of Publication:2016-01-01
Included Journals:CPCI-S
Page Number:994-1001
Key Words:Information extraction; Semi-supervised learning; Co-Training
Abstract:Adverse drug reactions between chemicals and diseases make chemical-disease relations (CDR) become a research focus. In this paper, we present a chemical-induced disease (Cm) relation extraction system, CIDExtractor, to extract cm relations from biomedical literature. CmExtractor first employs a sentence-level classifier to extract the CID relations located in the same sentence. To construct the classifier, a sentence-level training set is manually annotated and then Co-Training algorithm is used to exploit the unlabeled data with the feature kernel and graph kernel as two independent views. Then CIDExtractor uses a document-level classifier to extract the CID relations spanning multiple sentences. The classifier utilizes the document level information (features) of the chemical and disease pair. Finally, some post-processing rules are applied to the union set of two classifiers and generate the final outputs. Experimental results on the test set of BioCreative V CDR CID subtask show that CmExtractor can achieve better performance (an F-score of 67.72%) than the state-of-the-art methods. The online CIDExtractor demonstration system is available at http://202.118.75.18:8888/cdr-dut-ir/cid.html.