[Bioperl-l] fetching all alignments from a sam/bam by read header in perl

Sun Feb 26 06:24:14 UTC 2012

Hi Guys

Reading the doc page for Bio::DB::SAM I see there is a way to fetch reads
by name (read id) but the documentation also says this is slow.(copied
below).  I need to do about 300-500 million look ups and if each one is
costly I wanted to know if there is another slick low level way.  For my
application I would not have feature location just the read name.

-name          Filter on reads with the designated name. Note that
                 this can be a slow operation unless accompanied by
                 the feature location as well.

-Abhi

On Fri, Feb 24, 2012 at 6:58 AM, Abhishek Pratap <abhishek.vit at gmail.com>wrote:

> Hi Peter
>
> You got it right.
>
> Here is the link :
>
> http://biostar.stackexchange.com/questions/17787/fetching-all-alignments-from-a-sam-bam-by-read-header-in-perl
>
>
>
> -A
>
> On Fri, Feb 24, 2012 at 1:24 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Fri, Feb 24, 2012 at 12:55 AM, Abhishek Pratap
> > <abhishek.vit at gmail.com> wrote:
> >> I am wondering if there is a slick way access all the possible
> >> alignments for a read present in sam or bam file given the read
> >> header. Since the existing codebase is in perl I would prefer
> >> something which can be done in/via perl.
> >>
> >> By default BAM's are indexed by location so the inbuilt samtools
> >> indexing wont work I guess.
> >>
> >> I should also say the input bam file will have in the order of 500
> >> million total alignments and many reads are expected to be aligned to
> >> more than one place in the genome. Given the size of the data loading
> >> it all in one big hash is not turning out to be memory friendly.
> >
> > Are you asking for SAM/BAM read lookup by read name?
> >
> >> PS:  I also posted this earlier on Biostar.
> >
> > Link?
> >
> > Peter
>