杨志豪

个人信息Personal Information

教授

博士生导师

硕士生导师

性别:男

毕业院校:大连理工大学

学位:博士

所在单位:计算机科学与技术学院

电子邮箱:yangzh@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

Uncertainty sampling-based active learning for protein-protein interaction extraction from biomedical literature

点击次数:

论文类型:期刊论文

发表时间:2009-09-01

发表刊物:EXPERT SYSTEMS WITH APPLICATIONS

收录刊物:SCIE、EI、Scopus

卷号:36

期号:7

页面范围:10344-10350

ISSN号:0957-4174

关键字:Active learning; Uncertainty sampling; Protein-protein interaction extraction

摘要:Protein-protein interaction (PPI) extraction from biomedical literature has become a research focus with the rapid growth of the number of biomedical literature. Many methods have been proposed for PPI extraction including natural language processing techniques and machine learning approaches. One problem of applying machine learning approaches to PPI extraction is that large amounts of data are available but the cost of correctly labeling it prohibits its use. To reduce the amount of human labeling effort while maintaining the PPI extraction performance, the paper presents an uncertainty sampling-based method of active learning (USAL) in a lexical feature-based SVM model to tag the most informative unlabeled samples. In addition, some specific samples are ignored to speed up learning process while maintaining desired accuracy. The experiment results on AIMED and CB corpora show that our method can reduce the labeling by 40% and 20%, respectively, without degrading the performance. (C) 2009 Elsevier Ltd. All rights reserved.