[Biopython-dev] Bio.AlignIO for sequence alignment input/output

Peter biopython at maubp.freeserve.co.uk
Thu May 15 13:48:26 UTC 2008


Those of you subscribed to the CVS update feed (see
http://biopython.org/wiki/Tracking_CVS_commits and the RRS link) will
have noticed some activity in Bio.AlignIO which I originally proposed
adding a year ago.  See also enhancement Bug 2285,
http://bugzilla.open-bio.org/show_bug.cgi?id=2285

I've been using this code on and off in my own work, and have put
together a reasonable unit test.  I've finished a first draft of a new
chapter in the tutorial describing the module (you'll need to run
pdflatex or hevea on biopython/Doc/Tutorial.tex to read this), and
started a wiki page too: http://www.biopython.org/wiki/AlignIO

The API is deliberately very close to that of Bio.SeqIO, but deals
with Alignment objects rather than SeqRecord objects.  I'm hoping for
some feedback now, even if it is as little as pointing out any typos
in the documentation.  Also additional example input files would be
good - and checking the Biopython output is understood by third party
tools.

One particular issue with the API is handling ambiguous FASTA files
which have been used to store more than one alignment (discussed in
the updated tutorial).  There is an optional argument to the
Bio.AlignIO.parse() function to specify the number of sequences
expected per alignment which covers the most typical scenarios.  I am
open to the idea of simply removing this option, which means if the
user really wants to parse one of the ambigous files, they would have
to read in the individual sequences using Bio.SeqIO, batch them as
needed, and then create the alignments.

Peter



More information about the Biopython-dev mailing list