[Biopython-dev] [Biopython] Bio.Sequencing.Ace

Peter Cock p.j.a.cock at googlemail.com
Tue Jun 30 08:01:28 UTC 2009


On Mon, Jun 29, 2009 at 4:16 PM, Jose Blanca<jblanca at btc.upv.es> wrote:
>> Are you using Bio.Sequencing.Ace in your code, or did you write a whole
>> new parser instead?
> I wrote one, because I wanted to be able to get one particular contig or just
> the contig or the read names. But I don't think that is a problem. I gues
> that the biopyhon parser could be easily adapted to that.

I see. This touches on the indexing discussion - the same idea on
this thread would probably work on Ace files too:
http://lists.open-bio.org/pipermail/biopython/2009-June/005275.html

>> Now that I have been using Ace files in my own work, I've been meaning
>> to look over your stuff. In some ways, a contig class can be seen as a
>> generalisation of a multiple sequence alignment class. Certainly this is
>> something we should improve in Biopython (as you might gather from
>> some of the enhancement bugs on bugzilla, I have lots of ideas for the
>> current alignment class), and I'm sure you have some great ideas too.
>
> I think that here is the main deviation from Biopython. The contig class is
> similar to an alignment class, in fact my contig classes shoud be compatible
> with your new alignment proporsal api.

That's good. I agree that a specialised contig class that works like
the traditional multiple sequence alignment class would be nice.
It would then make sense to have Bio.AlignIO handle contigs as
well as traditional multiple sequence alignments.

> alignment.
> seq1 +++++++++>
> seq2 +++++++++>
> seq3 +++++++++>
>
> contig
> seq1 ++++>
> seq2    +++++>
> seq3        ++++++>
>
> Basically every read has a different coordinate system in the contig case.
> What I've done is to create a class named LocatableSequence that is a
> container for sequence objects. It works like:
>>>> seq1 = 'ATCG'
>>>> locseq1 = locate_sequence(seq1, location=10)
>>>> locseq1[10] == A
> In that way the contig is a list of LocatableSequences and the coordinate
> system transformations are done by the LocatableSequences, not by the contig.
> The LocatableSequences also allow for masks.
> The LocatableSequence works with any sequence like objects, strs, Seq,
> SeqRecord, lists, etc.
> There's also a Location class that represents a fragment of a sequence. My
> Location class is more limited than the one in the Biopython SeqFeature. In
> my case the start and end should be integers. I use this class to represent
> the region not masked in the sequence and the Location of the sequence inside
> the LocatableSequence.
> Take a look at Contig.py and at LocatableSequence.py, these are the most
> relevant classes for this.
> Best regards,

I'll have to make some time for looking at your code.

What I was thinking of was a contig class as an alignment subclass,
holding a list of SeqRecord objects and offsets. The consensus might
just be one element of this list - but could be handled specially. This
sounds simpler than having to introduce a whole new object system,
related to but different to SeqFeature objects. However, I don't yet
have a sample implementation to demonstrate this.

One important thing I think we should do BEFORE adding any contig
class to Biopython, is get it working with at least one other contig file
format in addition to Ace. I don't want to end up with a class which
is too specialised for how ace contigs work.

Peter




More information about the Biopython-dev mailing list