![]() |
个人信息Personal Information
教授
博士生导师
硕士生导师
主要任职:未来技术学院/人工智能学院副院长
性别:男
毕业院校:中国科技大学
学位:博士
所在单位:软件学院、国际信息与软件学院
联系方式:jianghe@dlut.edu.cn
Strict approximate pattern matching with general gaps
点击次数:
论文类型:期刊论文
发表时间:2015-04-01
发表刊物:APPLIED INTELLIGENCE
收录刊物:SCIE、EI、Scopus
卷号:42
期号:3
页面范围:566-580
ISSN号:0924-669X
关键字:Approximate pattern matching; Hamming distance; General gap; Online algorithm
摘要:Pattern matching with gap constraints is one of the essential problems in computer science such as music information retrieval and sequential pattern mining. One of the cases is called loose matching, which only considers the matching position of the last pattern substring in the sequence. One more challenging problem is considering the matching positions of each character in the sequence, called strict pattern matching which is one of the essential tasks of sequential pattern mining with gap constraints. Some strict pattern matching algorithms were designed to handle pattern mining tasks, since strict pattern matching can be used to compute the frequency of some patterns occurring in the given sequence and then the frequent patterns can be derived. In this article, we address a more general strict approximate pattern matching with Hamming distance, named SAP (Strict Approximate Pattern matching with general gaps and length constraints), which means that the gap constraints can be negative. We show that a SAP instance can be transformed into an exponential amount of the exact pattern matching with general gaps instances. Hence, we propose an effective online algorithm, named SETA (SubnETtree for sAp), based on the subnettree structure (a Nettree is an extension of a tree with multi-parents and multi-roots) and show the completeness of the algorithm. The space and time complexities of the algorithm are O(m x Maxlen x W x d) and O(Maxlen x W x m (2) x n x d), respectively, where m, Maxlen, W, and d are the length of pattern P, the maximal length constraint, the maximal gap length of pattern P and the approximate threshold. Extensive experimental results validate the correctness and effectiveness of SETA.