[Bioperl-l] reading blast report

Chris Fields cjfields at illinois.edu
Fri Jan 15 00:42:31 UTC 2010


On Jan 14, 2010, at 2:15 PM, Siddhartha Basu wrote:

> On Thu, 14 Jan 2010, Jason Stajich wrote:
> 
>> What aspects of the report are you loading?  You might consider the blast 
>> report as tab-delimited (-m 8 format) if you only are interested in 
>> start/end positions and scores of ailgnments which is a simpler and reduced 
>> dataset that has lower memory footprint by the parser.
> 
> I think this would be a better approach i am mostly interested in
> start/end/score data only.
> 
>> Searchio (default) -format => blast - you can try the BLAST -format => 
>> blast_pull instead which lazy parses to create objects and will reduce 
>> memory consumption.
> 
> It's another good option though. But just out of curosity,  so the
> regular blast parser do load the entire file in the memory consider the
> output consist of multiple Results concatenated together into a
> single file. Could anybody clarify.

Yes, the original SearchIO parsers all load the data into objects.  This was based on the presumption that one wouldn't want very large BLAST reports, but this assumption probably isn't amenable today.  The pull parser is one aswer to that, in it pulls the data only upon request (creates them on the fly), so it should be more amenable to parsing very large BLAST reports.

> thanks, 
> -siddhartha
> 
>> -jason

chris



More information about the Bioperl-l mailing list