[Bioperl-l] fetching all alignments from a sam/bam by read header in perl

Joel Martin j_martin at lbl.gov
Sun Feb 26 16:39:16 UTC 2012


Sort the bam by name so all hits are adjacent.  If you need to subsequently
do random lookups then you could add / alter tags for each read with
multiple hits indicating where those hits are and resort the bam by
coordinate.

Joel

On Sat, Feb 25, 2012 at 10:24 PM, Abhishek Pratap <abhishek.vit at gmail.com>wrote:

> Hi Guys
>
> Reading the doc page for Bio::DB::SAM I see there is a way to fetch reads
> by name (read id) but the documentation also says this is slow.(copied
> below).  I need to do about 300-500 million look ups and if each one is
> costly I wanted to know if there is another slick low level way.  For my
> application I would not have feature location just the read name.
>
> -name          Filter on reads with the designated name. Note that
>                 this can be a slow operation unless accompanied by
>                 the feature location as well.
>
>
> -Abhi
>
>
>
> On Fri, Feb 24, 2012 at 6:58 AM, Abhishek Pratap <abhishek.vit at gmail.com
> >wrote:
>
> > Hi Peter
> >
> > You got it right.
> >
> > Here is the link :
> >
> >
> http://biostar.stackexchange.com/questions/17787/fetching-all-alignments-from-a-sam-bam-by-read-header-in-perl
> >
> >
> >
> > -A
> >
> > On Fri, Feb 24, 2012 at 1:24 AM, Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> > > On Fri, Feb 24, 2012 at 12:55 AM, Abhishek Pratap
> > > <abhishek.vit at gmail.com> wrote:
> > >> I am wondering if there is a slick way access all the possible
> > >> alignments for a read present in sam or bam file given the read
> > >> header. Since the existing codebase is in perl I would prefer
> > >> something which can be done in/via perl.
> > >>
> > >> By default BAM's are indexed by location so the inbuilt samtools
> > >> indexing wont work I guess.
> > >>
> > >> I should also say the input bam file will have in the order of 500
> > >> million total alignments and many reads are expected to be aligned to
> > >> more than one place in the genome. Given the size of the data loading
> > >> it all in one big hash is not turning out to be memory friendly.
> > >
> > > Are you asking for SAM/BAM read lookup by read name?
> > >
> > >> PS:  I also posted this earlier on Biostar.
> > >
> > > Link?
> > >
> > > Peter
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list