location: Current position: Home >> Scientific Research >> Paper Publications

Algorithm of the text copy detection based on topic bag

Hits:

Indexed by:会议论文

Date of Publication:2010-10-23

Included Journals:EI、Scopus

Volume:1

Page Number:285-288

Abstract:In order to resolve the current problem about seriously academic plagiarism in the web environment, this article proposes an algorithm of the text copy detection on the topic bag and the algorithm uses the idea of semantic clustering and multi-instance learning. Firstly, a paper is divided into three layers construction tree: a leaf node denotes a sentence; a branch node represents a topic bag, and the topic bag formed by semantic clustering of several paragraphs; the uppermost a root node is a text. Secondly, the similarities of topic bags are calculated by the similarities of sentences; then we can get the similarity of two papers by similarities and weights of topic bags. Experiments show that the proposed algorithm has higher accuracy. ? 2010 IEEE.

Pre One:Research on applicability of sentence similarity algorithms in text copy detection

Next One:一种基于正文特征的新闻网页抽取方法