
教授 博士生导师 硕士生导师
性别:男
毕业院校:中国科技大学
学位:博士
所在单位:软件学院、国际信息与软件学院
学科:计算机应用技术
软件工程
电子邮箱:
开通时间: ..
最后更新时间:..
点击次数:
发布时间:2019-03-11
论文类型:期刊论文
发表时间:2009-06-01
发表刊物:Journal of Information and Computational Science
收录刊物:Scopus、EI
卷号:6
期号:3
页面范围:1495-1503
ISSN号:15487741
摘要:We firstly introduce the model of community identification to keywords extraction in web pages. Max- Flow algorithm can be used to identify a community in a local web graph which is concentrated on one topic. A web page contains a single relatively extensive topic too. Based on this observation, we treat words in a web page as nodes and relations between words as edges to construct a graph, rank words in the graph to find out some very important words as Seed-Keywords, and then input the graph and the Seed-Keywords to a modified version of Max-Flow algorithm to output a community, whose members are viewed as Target-Keywords. In this process, we do word sense disambiguation in a kind of context 'Topic-Block', whose precision is compared with a coarser-grained context, the whole web page and a finer-grained context, the basic element in HTML. The experiment results show that Topic-Block based word sense disambiguation is effective and Max-Flow algorithm can extract any number keywords adaptive to the size of web pages. 1548-7741/ Copyright ? 2009 Binary Information Press.
上一条:基于k最相似聚类的子空间聚类算法