[Biopython-dev] Beta code in the official releases?

Peter Cock p.j.a.cock at googlemail.com
Thu Sep 6 00:10:57 UTC 2012


On Wed, Sep 5, 2012 at 8:19 PM, Sczesnak, Andrew wrote:
> Yeah, it would be great if this module could finally be included.
> I've e-mailed the list numerous times asking what would be
> necessary to include it and have done all you and Brad have
> asked. I've watched you include bits and pieces of code from
> other contributors quickly and without much scrutiny, so I
> can't help but feel singled out. What is the logic in delaying
> this? We've heard from people who are already using the
> code and have asked when it will be pulled. Is it serving the
> community to not even include the basic reader/writer? Am
> I wasting my time? Is it your goal to actively discourage
> contributions?

In my mind, the main technical issue regarding MAF and AlignIO
and the common alignment object is the lack of a common way
of handling the idea of start/end (and sometimes strand) for
each sequence (in a consistent co-ordinate system using Python
counting). Evidently I haven't manage to adequately convey my
interpretation/concern.

Some file formats like EMBOSS' have these number explicitly
but we're not parsing them:
http://lists.open-bio.org/pipermail/biopython/2012-September/008142.html

In the case of "fasta-m10" the numbers are stored in private
properties as a 'short term' hack:
http://lists.open-bio.org/pipermail/biopython-dev/2012-June/009744.html

Others like Stockholm have identifier/start-end as a combined
names (but this is not mandatory). Here the start and end are
being stored in the annotations dictionary (as unparsed strings,
still using 1-based co-ordinates).

In MAF the start/end are explicit and much more important.
It would be near pointless to parse the the file ignoring these.
Maybe your approach is good enough for MAF, and we
should have adopted it as is, and delayed better integration
with the other AlignIO formats?

i.e. This is a general limitation in AlignIO and the object
model, somewhat annoying in the formats already supported,
but information critical to the MAF format.

I was expecting a convention for this to fall out of Bow's GSoC
work for 'pairwise alignments' in SearchIO - but the object
model he came up with was not SeqRecord based (many
of the file formats he was using didn't include sequences).

Right now my inclination is still to add a location property to
the SeqRecord, usually a FeatureLocation, but it could also
be the proposed CompoundLocation for more complex cases.
The question then is if/when this would be propagated, e.g.
SeqRecord slicing/addition.
http://lists.open-bio.org/pipermail/biopython-dev/2012-May/009646.html
http://lists.open-bio.org/pipermail/biopython-dev/2012-July/009803.html

So the wheels are turning, but slowly. I have not had as
much time to dedicate to this as I would like - but other
smaller or less inter-connected things are much easer to
review and merge.

Peter



More information about the Biopython-dev mailing list