[Bioperl-l] updating a reciprocal blast file

JustinV jvaughn7 at utk.edu
Fri Jun 6 14:18:16 UTC 2008


Jason,

Thanks for the suggestions.

The blast report is a single file containing each query in the 3 proteomes
against a database of the 3 proteomes.  As you probably remember, OrthoMCL
offers many modes of running that head-off various levels of processing.  In
any case, the reciprocal blast is the bottle-neck.  Mode -3 forgoes this
step, but it requires a single, exhaustive blast report.  Perhaps, I could
run the reciprocal blasts separately as you suggested and then integrate
them with your pairwise_blast_jobs_big.pl.  I have to say, this code looks a
little spooky.  Wouldn't it be possible to just resort (by normalized
e-value, discussed below) and reprint each set of query hits based on a hash
or index of the results of that query against the new proteome?  And then
tack on the results of new proteome against the updated database to the end
of the total blast report.

In terms of normalizing the e-value, since I am using a consistent scoring
matrix, can't I just recalculate the scores based on new database size:

(new e-value) = (new database size) * (old e-value) / (old database size)

as in http://www.springerlink.com/content/55m318wwqdgtw85h/ (methods) and
elsewhere.

As of yet, I've been satisfied with the run time downstream of the
reciprocal blast, but, as I've said, I'm currently only using three plant
(dicot) proteomes.  By "hack ORTHOMCL locally to cache things in DB_Files"
do you mean serializing the blastSeq objects from early blasts in the
blast_parse subroutine using bioperl-db or something?  Maybe this is dumb
assumption.  In any case, I'd be curious to see your modified version of the
OrthoMCL script.

Justin


Jason Stajich-3 wrote:
> 
> Are you keeping each pairwise in a separate file and then combining  
> it all?
>   http://fungalgenomes.org/~stajich/scripts/pairwise_blast_jobs_big.pl
> 
> Are you fixing E-values so they are scaled across different sized  
> databases? You will probably want to add a Z= parameter to insure  
> values are useable.
> 
> I also had to hack ORTHOMCL locally to cache things in DB_Files as it  
> was too memory intensive the way it runs on my big datasets.
> 
> -jason
> On Jun 5, 2008, at 8:53 AM, JustinV wrote:
> 
>>
>> I have a large reciprocal blast file that contains 3 proteomes.   
>> I'd like to
>> integrate another proteome for downstream clustering.    I imagine a
>> command-line script that takes as input the new proteome in fasta  
>> format,
>> the directory of the the old proteomes in fasta format, and the pre- 
>> existing
>> reciprocal blast file, and then performs the proper blasts and  
>> updates the
>> pre-existing reciprocal blast file accordingly.  I am using blast  
>> locally
>> and the downstream processing is done by OrthoMCL.  I assume this  
>> has been
>> handled before, but I can't track down the code.  If anyone is  
>> familiar with
>> a pre-exisiting script or has pertinent advice, I'd be much obliged.
>>
>> Justin
>> -- 
>> View this message in context: http://www.nabble.com/updating-a- 
>> reciprocal-blast-file-tp17673277p17673277.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/updating-a-reciprocal-blast-file-tp17673277p17693254.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.




More information about the Bioperl-l mailing list