[Bioperl-l] Re: reading blast report

Siddhartha Basu sidd.basu at gmail.com
Thu Jan 14 21:40:42 UTC 2010


Thanks jason for clarification.

On Thu, 14 Jan 2010, Jason Stajich wrote:

>
> On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote:
>
> > On Thu, 14 Jan 2010, Jason Stajich wrote:
> >
> >> What aspects of the report are you loading?  You might consider the blast
> >> report as tab-delimited (-m 8 format) if you only are interested in
> >> start/end positions and scores of ailgnments which is a simpler and 
> >> reduced
> >> dataset that has lower memory footprint by the parser.
> >
> > I think this would be a better approach i am mostly interested in
> > start/end/score data only.
> >
> >>
> >> Searchio (default) -format => blast - you can try the BLAST -format =>
> >> blast_pull instead which lazy parses to create objects and will reduce
> >> memory consumption.
> >
> > It's another good option though. But just out of curosity,  so the
> > regular blast parser do load the entire file in the memory consider the
> > output consist of multiple Results concatenated together into a
> > single file. Could anybody clarify.
> >
> > thanks,
> > -siddhartha
>
> Each result is parsed (1 result per query) and all the hits and HSPs are 
> parsed and brought into memory with the standard (non-pull) approach.
> The SearchIO iterates at the level of result - that is why you call 
> next_result which parses each one at a time.
>
> >
> >
> >>
> >> -jason
> >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
> >>
> >>> Hi,
> >>> I have a script that reads a tblastn report(13000 records) and loads in
> >>> a chado database(Bio::Chado::Schema module),  however the machine runs 
> >>> of
> >>> memory. I am trying to figure
> >>> out other than loading the database stuff
> >>> if it the reading of SearchIO module could consume a lot of memory. So,
> >>> when i am reading a blast file and getting the result object ....
> >>>
> >>> while (my $result = $searchio->next_result)
> >>>
> >>> * Does the searchio object loads a huge chunk of file in the memory or
> >>> for each iteration it only reads a part of the result.
> >>>
> >>> * Does doing an index on blast report and then reading from it be much
> >>> faster and why. And is there any way i could iterate through each
> >>> record in the index,  will that be helpful.
> >>>
> >>> -siddhartha
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at gmail.com
> >> jason at bioperl.org
> >> http://fungalgenomes.org/
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>



More information about the Bioperl-l mailing list