[BioLib-dev] [emboss-dev] EMBOSS 6.3.0 released - SAM/BAM

Peter biopython at maubp.freeserve.co.uk
Fri Jul 16 10:13:09 UTC 2010


On Fri, Jul 16, 2010 at 10:04 AM, Pjotr Prins <pjotr2010 at thebird.nl> wrote:
>
> From: Pjotr Prins <pjotr.public14 at thebird.nl>
> To: biolib-dev at lists.open-bio.org
> Subject: Re: [emboss-dev] EMBOSS 6.3.0 released - SAM/BAM
>
> EMBOSS has just recently added SAM/BAM support. I am looking at adding
> SAM/BAM support for the Bio* languages - BioRuby, BioPerl, BioPython
> and BioJava.
>
> There are three interesting implementations of BAM/SAM support. The
> Picard library (Java), Samtools API (C) and EMBOSS (C). A description
> of the Sequence Alignment/Map format (SAM) can be found
> [http://samtools.sourceforge.net/SAM1.pdf here]. SAM is a textual
> format, and BAM is the matching binary format. From the specification
> it is clear that BAM/SAM is a rather extensive format, for large
> files, and would certainly benefit from fast C parsing (over native
> Ruby/Perl/Python).

You might be able to get reasonable performance in Python, but C code
could be shared between languages.

Other implementations include my partial and unfinished implementation
of SAM+BAM parsing and indexing by read name in Python:
http://github.com/peterjc/biopython/tree/seqio-sam-bam
http://github.com/peterjc/biopython/tree/seqio-sam-bam-index

There is also the pysam python wrappers for the samtools C API,
http://code.google.com/p/pysam/

And a "clean up" fork of this by Kevin Jacobs which he hopes some of
will get folded back into pysam.
http://code.google.com/r/bioinformed-pysam/

Peter



More information about the BioLib-dev mailing list