[BioPython] annotations in an Alignment object

Mon Nov 10 11:28:00 UTC 2008

On Mon, Nov 10, 2008 at 11:04 AM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> Is there any way to store some annotations in an Alignment object??
> For example: the alignment tool used, its parameters, its version, the
> date, and the nature of the sequence aligned.

Not officially, no.  This is on my mental list of things to do with
the alignment object (after Biopython 1.49 is done).  I've CC'd the
dev-mailing list which is probably a better place to discuss the
details.

If you look at Bio/AlignIO/StockholmIO.py or the
Bio/AlignIO/FastaIO.py code you'll see I've recorded this kind of
information in a private dictionary, i.e. alignment._annotations.
This makes the data available if anyone really needs it, but signals
that this is not part of the public API and is likely to change.

As part of an alignment annotation enhancement, we should try and
establish some agreed standards for naming annotation entries (and
also counting systems).

> I am asking this because I would like to write a module to create
> ldhat input files from an alignment program.
> A ldhat file (http://www.stats.ox.ac.uk/~mcvean/LDhat/instructions.html)
> is very similar to a fasta file; the only difference is that in its
> first line, it contains three numbers, one of which can't always be
> inferred by the data.

Why go to the trouble of making a new Bio.AlignIO module?  For this
example from the LDhat manual, it looks like a FASTA file with an
extra header:

4 10 1
>SampleA
TCCGC??RTT
>SampleB
TACGC??GTA
>SampleC
TC?-CTTGTA
>SampleD
TCC-CTTGTT

Rather than writing support for a whole new file format, wouldn't it
be easier to do something like this:

alignment = ...
number_a = 4
number_b = 10
number_c = 1

handle = open("example.txt","w")
handle.write("%i %i %i\n" % (number_a, number_b, number_c))
handle.write(alignment.format("fasta"))
handle.close()

Peter