![]() |
个人信息Personal Information
教授
博士生导师
硕士生导师
主要任职:计算机科学与技术学院党委书记
性别:男
毕业院校:吉林大学
学位:博士
所在单位:计算机科学与技术学院
学科:计算机应用技术
办公地点:海山楼A1022
联系方式:hwge@dlut.edu.cn
电子邮箱:gehw@dlut.edu.cn
Enhancing protein homology batch search algorithm with sequence compression and clustering
点击次数:
论文类型:会议论文
发表时间:2016-01-01
收录刊物:CPCI-S
页面范围:1901-1903
摘要:Homology search is a tremendous application of bioinformatics in the field of molecular biology, protein function analysis and drug development. To perform batch search in the growing database, the basic approach is to run Blast on each of the original queries or concatenate queries by grouping them together. This paper proposes an enhanced protein homology batch search algorithm with sequence compression and clustering (C2-BLASTP), which takes advantage of the joint information among the query sequences as well as the database. In C2-BLASTP, the queries and database are firstly compressed by redundancy analysis. And then the database is clustered according to subsequence similarity. Following this, hits finding can be implemented in the clustered database. Furthermore, a final execution database is reconstructed based on potential hits to mitigate the increasing scale of the sequence database. Finally, homology batch search is performed in execution database. Experiments on NCBI NR database demonstrate the effectiveness of the C2-BLASTP for homology batch search in terms of homology accuracy, search speed and memory usage.