Hits:
Indexed by:会议论文
Date of Publication:2016-01-01
Included Journals:CPCI-S
Page Number:1901-1903
Abstract:Homology search is a tremendous application of bioinformatics in the field of molecular biology, protein function analysis and drug development. To perform batch search in the growing database, the basic approach is to run Blast on each of the original queries or concatenate queries by grouping them together. This paper proposes an enhanced protein homology batch search algorithm with sequence compression and clustering (C2-BLASTP), which takes advantage of the joint information among the query sequences as well as the database. In C2-BLASTP, the queries and database are firstly compressed by redundancy analysis. And then the database is clustered according to subsequence similarity. Following this, hits finding can be implemented in the clustered database. Furthermore, a final execution database is reconstructed based on potential hits to mitigate the increasing scale of the sequence database. Finally, homology batch search is performed in execution database. Experiments on NCBI NR database demonstrate the effectiveness of the C2-BLASTP for homology batch search in terms of homology accuracy, search speed and memory usage.