[Bioperl-l] producing an organism report from BLAST results?
Barry Moore
bmoore at genetics.utah.edu
Tue Nov 22 13:11:47 EST 2005
Tim-
The blastall command line options are sent in the http header when using
RemoteBlast, and you add them like this (some can also be passed as a
hash to new):
$Bio::Tools::Run::RemoteBlast::HEADER{'NCBI_GI'} = 'yes';
The above example should be equivalent to the blastall -I T option.
Bioperl's RemoteBlast uses the urlapi for blast from NCBI which is
documented here: http://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html. You
can see all of the parameters there.
I tried the above example, and I don't get the gi numbers in the report.
I suspected that they weren't making it through the parser, so I stepped
through the bioperl blast submission and retrieval process, and there
are no gi numbers in the report even before the bioperl parser gets a
hold of them. However, I can see the gi numbers come and go in the web
based output by modifying this url:
http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?NCBI_GI=yes&DATABASE=swisspr
ot&QUERY=%3E+%0Anmpkvileshskptdsvflqpwikaliednsehdqyhpsghvipsltkqdlalphm
sptiltn&COMPOSITION_BASED_STATISTICS=off&EXPECT=1&SERVICE=plain&FORMAT_O
BJECT=Alignment&CMD=Put&FILTER=L&PROGRAM=blastp
This is the same URL used by bioperl to submit this query (in fact I
copied it from the debugger as I was stepping through the bioperl code).
Below is the script I used for testing. If anyone wants to look at the
POST and the results, fire up this script in the debugger and run the
following commands from the debugger prompt. Keep repeating c 561 until
the NCBI server returns a report. You should see both the URL used, and
the output of the report.
c Bio::Tools::Run::RemoteBlast::retrieve_blast
n
$self->verbose(1);
c 561
Anyone else got any ideas why we're not getting gi numbers, or if we are
getting them, how you access them?
Barry
use strict;
use Bio::PrimarySeq;
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use Bio::SearchIO::Writer::TextResultWriter;
my @sequences;
for my $seq
("nmpkvileshskptdsvflqpwikaliednsehdqyhpsghvipsltkqdlalphmsptiltn",
"edwckgmdmdprkallivgipmecseveiqdtvkaglqplcayrvlgrmfrrednakavf") {
push @sequences, Bio::PrimarySeq->new(-seq => $seq,
-alphabet => 'protein');
}
#Remote-BLAST factory object creation and blast-parameter initialization
my $BLAST_factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>
'blastp',
'-data' =>
'swissprot',
'-expect' =>
'1',
'-readmethod' =>
'SearchIO' );
$Bio::Tools::Run::RemoteBlast::HEADER{'NCBI_GI'} = 'yes';
#Loop to BLAST each seq against the database, and check for a hit.
for my $seq (@sequences) {
print "Blasting Sequence: " . $seq->seq. "\n";
#Blast the sequence against a database:
my $job = $BLAST_factory->submit_blast($seq);
#Load the RID returned for the BLAST job submitted
my @rids = $BLAST_factory->each_rid;
my $rid = shift @rids;
my $blast_results;
until ($blast_results) {
print "Checking RID: $rid\n";
$blast_results = $BLAST_factory->retrieve_blast($rid);
sleep 5;
}
#Was a result returned?
if( !ref($blast_results) ) {
#If so and it returned an error remove that RID from the stack
if ($blast_results < 0) {
$BLAST_factory->remove_rid($rid);
}
}
#If a result was returned and it isn't an error, then pass it to a
#variable...
else {
my $result = $blast_results->next_result();
my $writer = new Bio::SearchIO::Writer::TextResultWriter();
my $out = new Bio::SearchIO(-writer => $writer);
$out->write_result($result);
$BLAST_factory->remove_rid($rid); #...and remove it's RID
}
}
}
More information about the Bioperl-l
mailing list