[Biopython] slow pairwise2 alignment

Peter biopython at maubp.freeserve.co.uk
Sat Jun 6 10:14:49 UTC 2009


On Fri, Jun 5, 2009 at 9:34 PM, Ogan ABAAN<oda.gumail at gmail.com> wrote:
> Hello everyone
>
> I am relatively new to Python/Biopython, but I am learning quickly. So you
> may see me sending questions your way every once in a while. Please be
> patient with me :)
>
> I have a naive question regarding the use of pairwise2. I am trying to get
> alignment scores for two 22mer primer sequences over a few million short
> DNA sequences using pairwise2. To speed thing up I am using 'score_only=1'
> argument. So I am averaginh about 5-6min per 500,000 sequences.

So to do a few million sequences is taking under 25 minutes? That doesn't
sound too bad.

If you need to speed this up further you might look at other other pairwise
alignment tools (e.g. EMBOSS needle?) but the overhead of parsing their
output may out weigh any raw speed advantage.

If you can show us your python script we *might* be able to suggest other
areas for improvement.

> I also found online that the c module could speed things up further. so
> when I load cpairwise2 no error message is displayed suggesting that it
> has been loaded.

If you use Bio.pairwise2 it will automatically use the compiled C code
(assuming it is available - which it seems to be in your case).

> However when I do cpairwise2.align.globalxx(seq1,seq2) I get the error
> message "AttributeError: 'module' object has no attribute 'align'". So does
> that mean cpairwise2 is not loaded. I would appreciate if someone can help
> me with this.

No - you just are not expected to call cpairwise2 directly, as Bio.pairwise2
does this for you.

Peter



More information about the Biopython mailing list