[Bioperl-l] RemoteBlast.pm getting RID requests-make/alter themethod?
cjfields at uiuc.edu
Mon Feb 6 17:27:56 UTC 2006
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C. Olson
> Sent: Friday, February 03, 2006 2:54 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
> I have been working with the RemoteBlast.pm module and have found that it
> a bit clunky to use loops to keep checking to see if you RID has finished.
> For example, every time you write a script, you need to add a code block
> (see example in the documentation) in order to keep checking if @rid is
> Would it be better to maybe write this in as a method in the RemoteBlast
> module? It seems like it would be better for remoteblast to have a method
> we could call say retrieve_when_done that would return the blast report
> the value of retrieve_blast is no longer 0.
Sounds reasonable, though I'm not sure how easy it would be to implement.
Why not drop by Bugzilla (http://bugzilla.bioperl.org/) and submit this as
> The only issue may be report parsing, but I wonder if it might be better
> separate out submittal/retrieval of BLAST requests from the parsing step
> make these more discrete processes? Since NCBI seems to be not supporting
> text results as a standard, maybe the module should work exclusively with
> XML and we could change report handling away from the headaches of text
> processing and just allow Bio::SeqIO or blastxml handle the task of making
> blast reports into different forms (such as HTML, text etc).
They are separated. RemoteBlast executes BLAST remotely (via HTTP).
Results are parsed via various Bio::SearchIO modules depending on what you
set '-readmethod' to. This is from perldoc:
Class for remote execution of the NCBI Blast via HTTP.
For a description of the many CGI parameters see:
Various additional options and input formats are available.
This is a driver for instantiating a parser for report files from
sequence database searches. This object serves as a wrapper for the
format parsers in Bio::SearchIO::* - you should not need to ever use
those format parsers directly. (For people used to the SeqIO system it,
we are deliberately using the same pattern).
Once you get a SearchIO object, calling next_result() gives you back a
Bio::Search::Result::ResultI compliant object, which is an object that
represents one Blast/Fasta/HMMER whatever report.
A list of module names and formats is below:
blast BLAST (WUBLAST, NCBIBLAST,bl2seq)
fasta FASTA -m9 and -m0
blasttable BLAST -m9 or -m8 output (NCBI not WUBLAST tabular)
psl UCSC PSL format
waba WABA output
axt AXT format
hmmer HMMER hmmpfam and hmmsearch
exonerate Exonerate CIGAR and VULGAR format
blastxml NCBI BLAST XML
wise Genewise -genesf format
See the SearchIO HOWTO linked from http://bioperl.org/HOWTOs/
This is also in the wiki online now:
I think the current line of thought is to make XML the default, but I also
know you would irritate a LOT of people out there by cutting off text output
parsing completely. Roger Hall or Jason pointed out that doing so will
break many scripts out there.
Furthermore, the problems with text output parsing are usually minimal. For
instance, the last one was a small change which broke a regex, causing an
infinite loop; the actual bug was in Bio::SearchIO::blast and not in
RemoteBlast. A simple addition to the regex fixed it. The only change to
RemoteBlast was to implement the option of saving XML formatted BLAST
I do like the idea of using XML output to build custom (bioperl-specific)
BLAST reports, but that also requires more work, likely a lot more work.
Again, maybe add that as an enhancement in Bugzilla or, better yet, submit
some sample code maybe as an example.
> This would definitely simplifying coding using the RemoteBlast.pm module
> then you could treat the report retrieval process as an object and just
> for the object to return its value, instead of coding in a bunch of test
> loops to see if it is done. This may also help keep bugs out of the
> and make the module longer lasting and not require module users to rewrite
> their code every time NCBI makes changes.
I think the most stable way of submitting jobs is by using the netblast
client (blastcl3) and parsing the results from that. No CGI, no HTML, just
saving to a temp file and parsing through SearchIO.
RemoteBlast was designed, I believe, with the idea of letting researchers
with some basic knowledge of perl use an interface familiar to them (i.e.
the BLAST interface at NCBI) and retrieve results on a regular basis. The
results are parsed via SearchIO::blast/blastxml/blasttable. The problem is,
though convenient, RemoteBlast is also reliant on the powers that be at NCBI
not changing anything dramatically. It is possible that NCBI could modify
the HTML code from the BLAST retrieval process, thus breaking RemoteBlast.
Text output could change again, even more dramatically, thus severely
breaking Bio::SearchIO::blast. Thus, we adapt to those changes by modifying
the broken modules. It's evolution at its finest. It's also a fact of life
that code breaks and needs to be fixed every once in a while to stay
Okay, I'm waxing philosophical now so I know I've definitely had too much
coffee. Must get back to work...
> Any thoughts or ideas?
> Is anyone working on this?
> Brad Olson
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l