杨志豪

个人信息Personal Information

教授

博士生导师

硕士生导师

性别:男

毕业院校:大连理工大学

学位:博士

所在单位:计算机科学与技术学院

电子邮箱:yangzh@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

Multiple kernel learning in protein-protein interaction extraction from biomedical literature

点击次数:

论文类型:期刊论文

发表时间:2021-01-12

发表刊物:ARTIFICIAL INTELLIGENCE IN MEDICINE

卷号:51

期号:3

页面范围:163-173

ISSN号:0933-3657

关键字:Support vector machines; Multiple kernel learning; Text mining; Information extraction; Protein-protein interaction

摘要:Objective: Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. The volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database administrators, responsible for content input and maintenance to detect and manually update protein interaction information. The objective of this work is to develop an effective approach to automatic extraction of PPI information from biomedical literature.
   Methods and materials: We present a weighted multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, graph and part-of-speech (POS) path. In particular, we extend the shortest path-enclosed tree (SPT) and dependency path tree to capture richer contextual information.
   Results: Our experimental results show that the combination of SPT and dependency path tree extensions contributes to the improvement of performance by almost 0.7 percentage units in F-score and 2 percentage units in area under the receiver operating characteristics curve (AUC). Combining two or more appropriately weighed individual will further improve the performance. Both on the individual corpus and cross-corpus evaluation our combined kernel can achieve state-of-the-art performance with respect to comparable evaluations, with 64.41% F-score and 88.46% AUC on the Aimed corpus.
   Conclusions: As different kernels calculate the similarity between two sentences from different aspects. Our combined kernel can reduce the risk of missing important features. More specifically, we use a weighted linear combination of individual kernels instead of assigning the same weight to each individual kernel, thus allowing the introduction of each kernel to incrementally contribute to the performance improvement. In addition, SPT and dependency path tree extensions can improve the performance by including richer context information. (C) 2010 Elsevier B.V. All rights reserved.