江贺

个人信息Personal Information

教授

博士生导师

硕士生导师

性别:男

毕业院校:中国科技大学

学位:博士

所在单位:软件学院、国际信息与软件学院

联系方式:jianghe@dlut.edu.cn

扫描关注

论文成果

当前位置: jianghe >> 科学研究 >> 论文成果

Approximate pattern matching with gap constraints

点击次数:

论文类型:期刊论文

发表时间:2016-10-01

发表刊物:JOURNAL OF INFORMATION SCIENCE

收录刊物:SCIE、EI、SSCI、Scopus

卷号:42

期号:5

页面范围:639-658

ISSN号:0165-5515

关键字:Approximate pattern matching; Gap constraints; Length constraint; Hamming distance; Nettree

摘要:Pattern matching is a key issue in sequential pattern mining. Many researchers now focus on pattern matching with gap constraints. However, most of these studies involve exact pattern matching problems, a special case of approximate pattern matching and a more challenging task. In this study, we introduce an approximate pattern matching problem with Hamming distance. Its objective is to compute the number of approximate occurrences of pattern P with gap constraints in sequence S under similarity constraint d. We propose an efficient algorithm named Single-rOot Nettree for approximate pattern matchinG with gap constraints (SONG) based on a new non-linear data structure Single-root Nettree to effectively solve the problem. Theoretical analysis and experiments demonstrate an interesting law that the ratio M(P,S,d)/N(P,S,m) approximately follows a binomial distribution, where M(P,S,d) and N(P,S,m) are the numbers of the approximate occurrences whose distances to pattern P are d (0dm) and no more than m (the length of pattern P), respectively. Experimental results for real biological data validate the efficiency and effectiveness of SONG.