[Biopython-dev] [BioPython] Ace contig files in Bio.SeqIO or Bio.AlignIO

Jose Blanca jblanca at btc.upv.es
Tue Jun 17 07:35:38 UTC 2008

My main use of the Alignment class is to parse Ace files. I've been thinking 
about that problem recently. My proposal to modify SeqRecord was due to this 
problem. I think that the best solution would be to treat the Alignment as a 
sequence. The consensus would be the actual sequences and the aligned read 
would be features with per-base-annotations. I've implemented such a class 
and it works fine for me. In fact the Alignment class is just a wrapper 
around a standard SeqRecord (I name it RichSeq in my implementation).
To do that you just need a SeqRecord with a __getitem__ method. You have 
already proposing that so that's not a problem.
Padding with spaces is not an option when you're dealing with genomic wide 
alignments, that's one of the problems of the actual Alignment class.
If you want I can send my implementation to the list, although it could take a 
while because I've got my home computer dead.
Best regards,

Jose Blanca

On Monday 16 June 2008 16:01:31 Peter wrote:
> I've recently had to deal with some contig files in the Ace format
> (output by CAP3, but many assembly files will produce this output).
> We have a module for parsing Ace files in Biopython,
> Bio.Sequencing.Ace but I was wondering about integrating this into the
> Bio.SeqIO or Bio.AlignIO framework.
> http://www.biopython.org/wiki/SeqIO
> http://www.biopython.org/wiki/AlignIO
> I'd like to hear from anyone currently using Ace files, on how they
> tend to treat the data - and if they think a SeqRecord or Alignment
> based representation would be useful.
> Each contig in an Ace file could be treated as a SeqRecord using the
> consensus sequence.  The identifiers of each sub-sequence used to
> build the consensus could be stored as database cross-references, or
> perhaps we could store these as SeqFeatures describing which part of
> the consensus they support.  This would then fit into Bio.SeqIO quite
> well.
> Alternatively, each contig could be treated as an alignment (with a
> consensus) and integrated into Bio.AlignIO.  One drawback for this is
> doing this with the current generic alignment class would require
> padding the start and/or end of each sequence with gaps in order to
> make every sequence the same length.  However, if we did this (or
> created a more specialised alignment class), the Ace file format would
> then fit into Bio.AlignIO too.
> So, Ace users - would either (or both) of the above approaches make
> sense for how you use the Ace contig files?
> Thanks
> Peter
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)

More information about the Biopython-dev mailing list