location: Current position: jianghe >> Scientific Research >> Paper Publications

Strict approximate pattern matching with general gaps

Hits:

Indexed by:期刊论文

Date of Publication:2015-04-01

Journal:APPLIED INTELLIGENCE

Included Journals:SCIE、EI、Scopus

Volume:42

Issue:3

Page Number:566-580

ISSN No.:0924-669X

Key Words:Approximate pattern matching; Hamming distance; General gap; Online algorithm

Abstract:Pattern matching with gap constraints is one of the essential problems in computer science such as music information retrieval and sequential pattern mining. One of the cases is called loose matching, which only considers the matching position of the last pattern substring in the sequence. One more challenging problem is considering the matching positions of each character in the sequence, called strict pattern matching which is one of the essential tasks of sequential pattern mining with gap constraints. Some strict pattern matching algorithms were designed to handle pattern mining tasks, since strict pattern matching can be used to compute the frequency of some patterns occurring in the given sequence and then the frequent patterns can be derived. In this article, we address a more general strict approximate pattern matching with Hamming distance, named SAP (Strict Approximate Pattern matching with general gaps and length constraints), which means that the gap constraints can be negative. We show that a SAP instance can be transformed into an exponential amount of the exact pattern matching with general gaps instances. Hence, we propose an effective online algorithm, named SETA (SubnETtree for sAp), based on the subnettree structure (a Nettree is an extension of a tree with multi-parents and multi-roots) and show the completeness of the algorithm. The space and time complexities of the algorithm are O(m x Maxlen x W x d) and O(Maxlen x W x m (2) x n x d), respectively, where m, Maxlen, W, and d are the length of pattern P, the maximal length constraint, the maximal gap length of pattern P and the approximate threshold. Extensive experimental results validate the correctness and effectiveness of SETA.

Pre One:Online hierarchical scheduling on two machines with known total size of low-hierarchy jobs

Next One:基于VMware的桌面虚拟化实验设计