Hits:
Indexed by:期刊论文
Date of Publication:2015-10-15
Journal:Journal of Computational Information Systems
Included Journals:EI、Scopus
Volume:11
Issue:20
Page Number:7387-7395
ISSN No.:15539105
Abstract:To avoid extracting uncertain statements as factual information, the detection of hedges and their scope becomes an important step in biomedical text mining. The current approaches focus on learning the detection models only with the labeled data. However, such approaches cannot make further progress due to the limited amount of training data and the difference between the training and working data. We proposes a co-training approach to make use of the limited labeled data to leverage some amounts of unlabeled data for boosting the detection performances of hedge cues and their scope. Experiments are carried out on the biomedical corpus of the CoNLL 2010 Shared Task and on free data derived from biomedical literature. Both the test data of the corpus and the free data are used as the unlabeled data. Experiment results show that the test data helps more than the free data on both tasks. The best F-score achieved in hedge cue identification is 88.12% and for hedge scope detection it is 63.09%, which significantly outperform previous systems. Co-training system can transfer the distribution of the unlabeled data to the labeled training data to improve the performance on the unlabeled data effectively. Copyright ? 2015 Binary Information Press.