[Bioperl-l] timing out a blast in StandAloneBlast.pm

BHurwitz@twt.com BHurwitz@twt.com
Tue, 2 Jul 2002 13:01:02 -0500


>But if the problem is only the memory limitation of the list of HSPs,
>why don't you just chop up the genome to appropriately sized and
>perhaps appropriately overlapping slices for running blast on them
>separately? Then by simply adjusting the coordinates you may generate
>exactly the same fragmented list that you wanted, and you can
>collect/store it in the separate file for a final sorting and later
>reprocessing.

Hi Peter,

Chopping up the sequences is a great idea!  This will definately limit the
amount of memory for storing all of these HSPs and stop killing our Linux
machines.  In the meantime, I have gotten pretty far in adding in megablast
to the bioperl modules.  I guess it is good to try both anyway.

-Bonnie



                                                                                                                                              
                    Peter Kos                                                                                                                 
                    <kos@rite.or.        To:     "'BHurwitz@twt.com'" <BHurwitz@twt.com>                                                      
                    jp>                  cc:     "'bioperl-l@bioperl.org'" <bioperl-l@bioperl.org>                                            
                                         Subject:     RE: [Bioperl-l] timing out a blast in StandAloneBlast.pm                                
                    07/02/2002                                                                                                                
                    12:26 PM                                                                                                                  
                    Please                                                                                                                    
                    respond to                                                                                                                
                    "kos@rite.or.                                                                                                             
                    jp"                                                                                                                       
                                                                                                                                              
                                                                                                                                              




Hi Bonnie,

Wait a minute. This is a completely different issue.
I usually suggest first playing with the parameters, since people
tend to forget about them and encounter problems, which can be easily
solved this way.
But if the problem is only the memory limitation of the list of HSPs,
why don't you just chop up the genome to appropriately sized and
perhaps appropriately overlapping slices for running blast on them
separately? Then by simply adjusting the coordinates you may generate
exactly the same fragmented list that you wanted, and you can
collect/store it in the separate file for a final sorting and later
reprocessing.
(The same thing, just in another dimension/direction/axis/whatever in
the time-sequence-memory space.)

It may not be as elegant as adding Megablast to StandAloneBlast.pm,
but at least you may keep all the possible HSPs that you originally
wanted. And, moreover, it seems to be more simple, so I would be able
to do this, while I am not good enough to mess around in the modules.
You can be much better, of course.

Best regards
Peter

> Thank you so much for your response.  I completely agree with you.
>  Since
> BLAST is not retrieving HSPs in any particular order timing out a
> BLAST
> does not make scientific sense.  Our problem is that our linux
boxes
> seem
> to crash on sequences that have a large number of HSPs (yes, we do
> need
> better hardware...).  My thought was to time these out and capture
> them in
> a separate file for more "specialized" processing later.  But, I
> think
> perhaps a better solution is to play with the BLAST parameters to
> allow
> less HSPs through and use this set of parameters for the whole set,
> as you
> suggested.  Unfortunately, BLAST doesn't have options like
Megablast
> does
> for limiting searches on "-p percent identity" and "-s score", so I
> am
> working on adding Megablast to the existing StandAloneBlast.pm
> program,
> which will hopefully help me to limit the HSPs a little better.
>  After
> looking at the code for StandAloneBlast.pm it is just a wrapper for
> BLAST,
> so there is no magic going on there that isn't in the regular BLAST
> program.  Thank you for your all of your help!
>
> Kind Regards,
> Bonnie