葛宏伟

个人信息Personal Information

教授

博士生导师

硕士生导师

主要任职:计算机科学与技术学院党委书记

性别:男

毕业院校:吉林大学

学位:博士

所在单位:计算机科学与技术学院

学科:计算机应用技术

办公地点:创新园大厦A832

联系方式:hwge@dlut.edu.cn

电子邮箱:gehw@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

Enhancing protein homology batch search algorithm with sequence compression and clustering

点击次数:

论文类型:会议论文

发表时间:2016-01-01

收录刊物:CPCI-S

页面范围:1901-1903

摘要:Homology search is a tremendous application of bioinformatics in the field of molecular biology, protein function analysis and drug development. To perform batch search in the growing database, the basic approach is to run Blast on each of the original queries or concatenate queries by grouping them together. This paper proposes an enhanced protein homology batch search algorithm with sequence compression and clustering (C2-BLASTP), which takes advantage of the joint information among the query sequences as well as the database. In C2-BLASTP, the queries and database are firstly compressed by redundancy analysis. And then the database is clustered according to subsequence similarity. Following this, hits finding can be implemented in the clustered database. Furthermore, a final execution database is reconstructed based on potential hits to mitigate the increasing scale of the sequence database. Finally, homology batch search is performed in execution database. Experiments on NCBI NR database demonstrate the effectiveness of the C2-BLASTP for homology batch search in terms of homology accuracy, search speed and memory usage.