大连理工大学  登录  English 
张宪超
点赞:

教授   博士生导师   硕士生导师

性别: 男

毕业院校: 中国科技大学

学位: 博士

所在单位: 软件学院、国际信息与软件学院

学科: 计算机应用技术. 软件工程

电子邮箱: xczhang@dlut.edu.cn

手机版

访问量:

开通时间: ..

最后更新时间: ..

当前位置: 中文主页 >> 科学研究 >> 论文成果
Automatic seed set expansion for trust propagation based anti-spam algorithms

点击次数:

论文类型: 期刊论文

发表时间: 2013-05-20

发表刊物: INFORMATION SCIENCES

收录刊物: SCIE、EI、Scopus

卷号: 232

页面范围: 167-187

ISSN号: 0020-0255

关键字: Search engine; Link analysis; Web spam; Seed expansion; Trust propagating

摘要: Seed sets are of significant importance to trust propagation based anti-spam algorithms, e.g., TrustRank. Conventional approaches require manual evaluation to construct a seed set, which restricts the seed set to be small in size, since it would cost too much and may even be impossible to construct a very large seed set manually. The detrimental effect will be caused to the final ranking results by the small-sized seed sets. Thus, it is desirable to automatically expand an initial seed set to a larger one. In this paper, we propose an automatic seed set expansion algorithm (ASE) which enriches a small seed set to a much larger one. The intuition behind ASE is that if a page is recommended by a number of trustworthy pages, the page itself should be trustworthy as well. Since links on the Web can be considered as a tool for conveying recommendation, we call links recommending the same page a joint recommendation link structure. The joint recommendation link structures with large enough support degrees are employed by ASE algorithm to obtain new seeds. It can be proved that using the joint recommendation link structure with a suitable support degree, the probability of selecting a spam page as a new seed almost to zero, thus the quality of the expanded seed set can be guaranteed. Experimental results on the WEBSPAM-UK2007 dataset show that with the same manual evaluation efforts, ASE can automatically obtain a lot of reputable seeds with very high quality, and significantly improves the performance of trust propagation based algorithms such as TrustRank and CPV (Computing Page Values). (C) 2013 Elsevier Inc. All rights reserved.

辽ICP备05001357号 地址:中国·辽宁省大连市甘井子区凌工路2号 邮编:116024
版权所有:大连理工大学