[Biopython-dev] [BioPython] Ace contig files in Bio.SeqIO or Bio.AlignIO

Jose Blanca jblanca at btc.upv.es
Tue Jun 17 07:35:38 UTC 2008


Hi:
My main use of the Alignment class is to parse Ace files. I've been thinking 
about that problem recently. My proposal to modify SeqRecord was due to this 
problem. I think that the best solution would be to treat the Alignment as a 
sequence. The consensus would be the actual sequences and the aligned read 
would be features with per-base-annotations. I've implemented such a class 
and it works fine for me. In fact the Alignment class is just a wrapper 
around a standard SeqRecord (I name it RichSeq in my implementation).
To do that you just need a SeqRecord with a __getitem__ method. You have 
already proposing that so that's not a problem.
Padding with spaces is not an option when you're dealing with genomic wide 
alignments, that's one of the problems of the actual Alignment class.
If you want I can send my implementation to the list, although it could take a 
while because I've got my home computer dead.
Best regards,

Jose Blanca

On Monday 16 June 2008 16:01:31 Peter wrote:
> I've recently had to deal with some contig files in the Ace format
> (output by CAP3, but many assembly files will produce this output).
>
> We have a module for parsing Ace files in Biopython,
> Bio.Sequencing.Ace but I was wondering about integrating this into the
> Bio.SeqIO or Bio.AlignIO framework.
> http://www.biopython.org/wiki/SeqIO
> http://www.biopython.org/wiki/AlignIO
>
> I'd like to hear from anyone currently using Ace files, on how they
> tend to treat the data - and if they think a SeqRecord or Alignment
> based representation would be useful.
>
> Each contig in an Ace file could be treated as a SeqRecord using the
> consensus sequence.  The identifiers of each sub-sequence used to
> build the consensus could be stored as database cross-references, or
> perhaps we could store these as SeqFeatures describing which part of
> the consensus they support.  This would then fit into Bio.SeqIO quite
> well.
>
> Alternatively, each contig could be treated as an alignment (with a
> consensus) and integrated into Bio.AlignIO.  One drawback for this is
> doing this with the current generic alignment class would require
> padding the start and/or end of each sequence with gaps in order to
> make every sequence the same length.  However, if we did this (or
> created a more specialised alignment class), the Ace file format would
> then fit into Bio.AlignIO too.
>
> So, Ace users - would either (or both) of the above approaches make
> sense for how you use the Ace contig files?
>
> Thanks
>
> Peter
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython



-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)



More information about the Biopython-dev mailing list