大连理工大学  登录  English 
张宪超
点赞:

教授   博士生导师   硕士生导师

性别: 男

毕业院校: 中国科技大学

学位: 博士

所在单位: 软件学院、国际信息与软件学院

学科: 计算机应用技术. 软件工程

电子邮箱: xczhang@dlut.edu.cn

手机版

访问量:

开通时间: ..

最后更新时间: ..

当前位置: 中文主页 >> 科学研究 >> 论文成果
Keywords extraction from web pages using semantic link analysis

点击次数:

论文类型: 期刊论文

发表时间: 2009-06-01

发表刊物: Journal of Information and Computational Science

收录刊物: EI、Scopus

卷号: 6

期号: 3

页面范围: 1495-1503

ISSN号: 15487741

摘要: We firstly introduce the model of community identification to keywords extraction in web pages. Max- Flow algorithm can be used to identify a community in a local web graph which is concentrated on one topic. A web page contains a single relatively extensive topic too. Based on this observation, we treat words in a web page as nodes and relations between words as edges to construct a graph, rank words in the graph to find out some very important words as Seed-Keywords, and then input the graph and the Seed-Keywords to a modified version of Max-Flow algorithm to output a community, whose members are viewed as Target-Keywords. In this process, we do word sense disambiguation in a kind of context 'Topic-Block', whose precision is compared with a coarser-grained context, the whole web page and a finer-grained context, the basic element in HTML. The experiment results show that Topic-Block based word sense disambiguation is effective and Max-Flow algorithm can extract any number keywords adaptive to the size of web pages. 1548-7741/ Copyright ? 2009 Binary Information Press.

辽ICP备05001357号 地址:中国·辽宁省大连市甘井子区凌工路2号 邮编:116024
版权所有:大连理工大学