[Biopython] pairwise sequence alignment programs in biopython

Peter Cock p.j.a.cock at googlemail.com
Wed Jul 11 07:52:09 UTC 2018


To clarify on length of sequences, I had forgotten the details, see:

https://github.com/biopython/biopython/pull/1655#issuecomment-390180240

If you just want the alignment lengths, the new Align.PairwiseAligner
wins, if you want the alignments themselves, then pairwise2 wins.

On the other hand, with random sequences of 5000bp, Michiel
reported his new Align.PairwiseAligner was faster.

How much memory (RAM) do you have, and are you using a
32bit operating system? It is likely memory limits which is stopping
you align over about 2000 sequences.

Peter

On Wed, Jul 11, 2018 at 12:12 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi John,
>
> The Align.PairwiseAligner code is new in Biopython 1.72, and
> better support for longer sequences was one of the improvements.
>
> You would probably find it useful to read over the pull request:
> https://github.com/biopython/biopython/pull/1655
>
>
> Peter
>
> On Tue, Jul 10, 2018 at 7:51 PM, John Berrisford <jmb at ebi.ac.uk> wrote:
>> Hi
>>
>>
>>
>> I’m looking at performing pairwise alignments of polymer sequences in
>> biopython.
>>
>> These will be protein or nucleotide sequences. They may include non-standard
>> residues which will be denoted as X.
>>
>> The sequences will be of varying length from around 20 residues up to
>> several thousand residues – put simply the range of sequences in the PDB.
>>
>>
>>
>> I’m looking for the best tool to use to do this in biopython
>>
>>
>>
>> So far I have performed tests with pairwise2 and Align.PairwiseAligner.
>>
>> From my tests it seems that pairwise2 has a limit of ~2000 residues – i.e.
>> if I give it a sequence of 2500 residues to compare against itself it
>> crashes. PairwiseAligner seems to be able to handle much longer sequences
>> without issue.
>>
>>
>>
>> I need to be able to set gap penalties – which is possible in both of these
>> programs.
>>
>>
>>
>> So my question are:
>>
>> Are these the only options in biopython? – I would prefer a python
>> implementation rather than something that requires external compilation i.e.
>> Emboss Needle
>>
>> Are these the best options?
>>
>> Are they both maintained / stable?
>>
>> Are they comparable in their results?
>>
>> Is the limitation in sequence length in pairwise2 a known issue? A quick
>> google search suggests most people use pairwise2, which is strange given its
>> sequence length limitation.
>>
>>
>>
>> Thank you
>>
>>
>>
>> John
>>
>>
>>
>> --
>>
>> John Berrisford
>>
>> PDBe
>>
>> European Bioinformatics Institute (EMBL-EBI)
>>
>> European Molecular Biology Laboratory
>>
>> Wellcome Genome Campus
>>
>> Hinxton
>>
>> Cambridge CB10 1SD UK
>>
>> Tel: +44 1223 492529
>>
>>
>>
>> https://www.pdbe.org
>>
>> https://www.facebook.com/proteindatabank
>>
>> https://twitter.com/PDBeurope
>>
>>
>>
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list