[Biopython-dev] Alignment object

Kevin Jacobs <jacobs@bioinformed.com> bioinformed at gmail.com
Tue Mar 2 17:07:03 UTC 2010


On Tue, Mar 2, 2010 at 10:03 AM, Brad Chapman <chapmanb at 50mail.com> wrote:

> Kevin;
> > > I'm just jumping in here and have not yet read all of the background
> > > material.  However, I am working with next-gen alignments and am
> > > curious as to what you have in mind.  At first glance, it sounds like
> > > you want to access aligned reads in a 'pileup' format (i.e., an object
> > > model akin to http://samtools.sourceforge.net/pileup.shtml).  Or are
> > > you thinking of something different entirely?
>
> This is a good way to go. SAM is at least an emerging standard that
> people are adopting, and samtools and the pysam module do a good job
> of dealing with them:
>
> http://code.google.com/p/pysam/
>
>
I find pysam pretty limited for doing more than reading and subsetting
SAM/BAM files.  I'm planning to add a constructor and helper functions for
creating new aligned reads.  The current AlignedRead object is also
read-only, which will need to be relaxed for many serious applications.
 Until then, I'm writing (text) SAM records and piping them to samtools to
encode in BAM format (see the script attached to one of my earlier emails).


> pysam exposes a Pileup style API from sorted and indexed BAM files
> and scales great for large alignment files:
>
> http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/api.html



Scalability is okay for conversion to pileup format, but not what I'd
consider great.  But I agree, pysam is a good starting point.  I just wish
that the read identifiers and attributes were  available via the C API,
since those are often needed when, e.g., writing a genotype caller.

-Kevin



More information about the Biopython-dev mailing list