[Biopython-dev] [Biopython] Bio.Sequencing.Ace

Peter Cock p.j.a.cock at googlemail.com
Tue Jun 30 09:47:51 UTC 2009


On Tue, Jun 30, 2009 at 9:31 AM, Jose Blanca<jblanca at btc.upv.es> wrote:
>> What I was thinking of was a contig class as an alignment subclass,
>> holding a list of SeqRecord objects and offsets. The consensus might
>> just be one element of this list - but could be handled specially. This
>> sounds simpler than having to introduce a whole new object system,
>> related to but different to SeqFeature objects. However, I don't yet
>> have a sample implementation to demonstrate this.
>
> I thought about that implementation and I created some code. The
> problem I found with that approach is that the contig class code got
> too messy.  Take into account that besides the offset you also need
> the masks and that some sequences could be reversed. That's why
> I decided to split the part that calculates the offset and the mask
> into a separate class.

A simple masked sequence class would also be useful for Roche SFF
files which hold sequencing reads (of about 500bp) with start and end
trim points. This is a use case separate from the location offset in an
alignment - so I'm not convinced it makes sense to do both in one
class.

Perhaps having the contig class hold a list of (masked) SeqRecord
objects, their offset, and their direction would work?

>> One important thing I think we should do BEFORE adding any contig
>> class to Biopython, is get it working with at least one other contig file
>> format in addition to Ace. I don't want to end up with a class which
>> is too specialised for how ace contigs work.
>
> Well, In fact my contig class is modeled after the caf file format.
> The ace parsing was just an afterthought, my primary interest
> was the caf format.

Well, as the CAF file format was an extension of the ACE format,
perhaps a third contig format would be worth looking at before
considering if a contig class would be sufficiently general.

Have you got any links to the CAF file format you found useful
when writing your parser? In addition to:
http://www.sanger.ac.uk/Software/formats/CAF/
http://www.genome.org/cgi/content/full/8/3/260

Thanks,

Peter




More information about the Biopython-dev mailing list