葛宏伟
Personal Homepage
Paper Publications
Fast batch searching for protein homology based on compression and clustering
Hits:

Indexed by:期刊论文

Date of Publication:2017-11-21

Journal:BMC BIOINFORMATICS

Included Journals:SCIE、EI、PubMed

Volume:18

Issue:1

Page Number:508

ISSN No.:1471-2105

Key Words:Protein homology; Batch searching; Compression; Clustering

Abstract:Background: In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the common subsequences shared by queries.
   Results: We propose a compression and cluster based BLASTP (C2-BLASTP) algorithm to further exploit the joint information among the query sequences and the database. Firstly, the queries and database are compressed in turn by procedures of redundancy analysis, redundancy removal and distinction record. Secondly, the database is clustered according to Hamming distance among the subsequences. To improve the sensitivity and selectivity of sequence alignments, ten groups of reduced amino acid alphabets are used. Following this, the hits finding operator is implemented on the clustered database. Furthermore, an execution database is constructed based on the found potential hits, with the objective of mitigating the effect of increasing scale of the sequence database. Finally, the homology search is performed in the execution database. Experiments on NCBI NR database demonstrate the effectiveness of the proposed C2-BLASTP for batch searching of homology in sequence database. The results are evaluated in terms of homology accuracy, search speed and memory usage.
   Conclusions: It can be seen that the C2-BLASTP achieves competitive results as compared with some state-of-the-art methods.

Personal information

Professor
Supervisor of Doctorate Candidates
Supervisor of Master's Candidates

Main positions:计算机科学与技术学院党委书记

Gender:Male

Alma Mater:吉林大学

Degree:Doctoral Degree

School/Department:计算机科学与技术学院

Discipline:Computer Applied Technology

Business Address:海山楼A1022

Contact Information:hwge@dlut.edu.cn

Click:

Open time:..

The Last Update Time:..


Address: No.2 Linggong Road, Ganjingzi District, Dalian City, Liaoning Province, P.R.C., 116024

MOBILE Version