[Bioperl-l] problems with blast parser

Jason Stajich jason.stajich at duke.edu
Thu Apr 6 17:30:17 UTC 2006


I'm pretty sure for thousands of HSPs this can be an out of memory  
problem.  I've explained workarounds before on the list, but they  
basically mean building a new listener object that creates simple  
hashes (or arrays) instead of full-blown HSP objects.  Personally I  
use a hybrid approach depending on the dataset  - SearchIO can be too  
slow and too memory intensive for the cases where I am just getting  
top hits or summary stats, but if I want the alignment strings, more  
stats, etc then I use SearchIO.


The question is - do you really want to be parsing a huge file, can  
you get away with using tabular output (-m8 or -m9) from BLAST?  If  
you are balking at re-running the blast something like blast2table is  
simple pure-perl to generate an -m 8 tabular output from BLAST report  
very efficiently.  This is discussed on the bioperl BLAST wiki page I  
believe.


-jason
On Apr 6, 2006, at 11:56 AM, Chris Fields wrote:

> Alessandro,
>
> We need to know a few things first:
>
> 1)  What version of Bioperl?
> 2)  BLAST version?
> 3)  What OS?
> 4)  Perl version?
> 5)  Exactly how large is your file?
>
> It would also be nice to see at least a chunk of your script to  
> rule out a
> logic error there.  If you want you can also submit your script by  
> filing
> this as a bug in Bugzilla and attaching your script.
>
> http://www.bioperl.org/wiki/Bugs
>
> If you have an older version of Bioperl (such as 1.4) consider  
> upgrading to
> 1.5.1 or CVS.  Lots of fixes have been incorporated since 1.4,  
> including to
> SearchIO.
>
> Chris
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Alessandro S. Nascimento
>> Sent: Tuesday, April 04, 2006 10:28 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] problems with blast parser
>>
>> Hi all
>>
>> I'm trying to parse a blast standalone (blaspgp) result file and  
>> filter
>> some sequences using length and identity. The script used to work but
>> this time after several minutes working in 99.9% of my processor I  
>> have
>> a "killed"message with no more information. The blast file is very
>> large. Does anyone have any clue ?
>>
>> Thanks in advance
>>
>> Alessandro
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12





More information about the Bioperl-l mailing list