[Biopython-dev] [Biopython] Bio.Sequencing.Ace

Jose Blanca jblanca at btc.upv.es
Mon Jun 29 15:16:06 UTC 2009


> Are you using Bio.Sequencing.Ace in your code, or did you write a whole
> new parser instead?
I wrote one, because I wanted to be able to get one particular contig or just 
the contig or the read names. But I don't think that is a problem. I gues 
that the biopyhon parser could be easily adapted to that.

> Now that I have been using Ace files in my own work, I've been meaning
> to look over your stuff. In some ways, a contig class can be seen as a
> generalisation of a multiple sequence alignment class. Certainly this is
> something we should improve in Biopython (as you might gather from
> some of the enhancement bugs on bugzilla, I have lots of ideas for the
> current alignment class), and I'm sure you have some great ideas too.

I think that here is the main deviation from Biopython. The contig class is 
similar to an alignment class, in fact my contig classes shoud be compatible 
with your new alignment proporsal api.

alignment.
seq1 +++++++++>
seq2 +++++++++>
seq3 +++++++++>

contig
seq1 ++++>
seq2    +++++>
seq3        ++++++>

Basically every read has a different coordinate system in the contig case. 
What I've done is to create a class named LocatableSequence that is a 
container for sequence objects. It works like:
>>> seq1 = 'ATCG'
>>> locseq1 = locate_sequence(seq1, location=10)
>>> locseq1[10] == A
In that way the contig is a list of LocatableSequences and the coordinate 
system transformations are done by the LocatableSequences, not by the contig.
The LocatableSequences also allow for masks.
The LocatableSequence works with any sequence like objects, strs, Seq, 
SeqRecord, lists, etc.
There's also a Location class that represents a fragment of a sequence. My 
Location class is more limited than the one in the Biopython SeqFeature. In 
my case the start and end should be integers. I use this class to represent 
the region not masked in the sequence and the Location of the sequence inside 
the LocatableSequence.
Take a look at Contig.py and at LocatableSequence.py, these are the most 
relevant classes for this.
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)



More information about the Biopython-dev mailing list