[Biopython-dev] [biopython] added unstable Bam parser class (e6343eb)

Peter Cock p.j.a.cock at googlemail.com
Tue Apr 2 16:07:33 UTC 2013


On Tue, Apr 2, 2013 at 11:01 AM, Tiago Antão <tiagoantao at gmail.com> wrote:
> I did a small test, just getting rec.rname and rec.pos (using Peter's
> parser). This is something I actually need to do, to calculate basic
> statistics.
>
> Indeed for 1M reads, samtools is 3s whereas the pure Python parser takes 20s.
>
> Tiago

Those numbers are more believable. Was that using SAM or BAM?
Which Python?

Note that the rname (name of the reference a read is mapped to) is
an interesting one, given explicitly as a string in SAM but as an
integer offset in BAM. The pysam parser gives the low level index
when parsing BAM, while mine is consistent and returns the ref
name as a string for both SAM and BAM. This was a design choice
to make the BAM reads self contained and avoid some of the rough
edges with pysam where you must manage the reference indexes
manually sometimes.

Regards,

Peter




More information about the Biopython-dev mailing list