[Bioperl-l] problems with blast parser
Jason Stajich
jason.stajich at duke.edu
Thu Apr 6 17:30:17 UTC 2006
I'm pretty sure for thousands of HSPs this can be an out of memory
problem. I've explained workarounds before on the list, but they
basically mean building a new listener object that creates simple
hashes (or arrays) instead of full-blown HSP objects. Personally I
use a hybrid approach depending on the dataset - SearchIO can be too
slow and too memory intensive for the cases where I am just getting
top hits or summary stats, but if I want the alignment strings, more
stats, etc then I use SearchIO.
The question is - do you really want to be parsing a huge file, can
you get away with using tabular output (-m8 or -m9) from BLAST? If
you are balking at re-running the blast something like blast2table is
simple pure-perl to generate an -m 8 tabular output from BLAST report
very efficiently. This is discussed on the bioperl BLAST wiki page I
believe.
-jason
On Apr 6, 2006, at 11:56 AM, Chris Fields wrote:
> Alessandro,
>
> We need to know a few things first:
>
> 1) What version of Bioperl?
> 2) BLAST version?
> 3) What OS?
> 4) Perl version?
> 5) Exactly how large is your file?
>
> It would also be nice to see at least a chunk of your script to
> rule out a
> logic error there. If you want you can also submit your script by
> filing
> this as a bug in Bugzilla and attaching your script.
>
> http://www.bioperl.org/wiki/Bugs
>
> If you have an older version of Bioperl (such as 1.4) consider
> upgrading to
> 1.5.1 or CVS. Lots of fixes have been incorporated since 1.4,
> including to
> SearchIO.
>
> Chris
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Alessandro S. Nascimento
>> Sent: Tuesday, April 04, 2006 10:28 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] problems with blast parser
>>
>> Hi all
>>
>> I'm trying to parse a blast standalone (blaspgp) result file and
>> filter
>> some sequences using length and identity. The script used to work but
>> this time after several minutes working in 99.9% of my processor I
>> have
>> a "killed"message with no more information. The blast file is very
>> large. Does anyone have any clue ?
>>
>> Thanks in advance
>>
>> Alessandro
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
More information about the Bioperl-l
mailing list