[Bioperl-l] Easy switching from wwwBlast to QBlast

Madeleine Lemieux mlemieux at bioinfo.ca
Tue Mar 22 12:26:35 EST 2005


Jason,

I've used RemoteBlast.pm (1.5 version) as a template for a new module 
I'm calling LocalServerBlast.pm which lets users submit jobs to a 
wwwBlast server. I've also added a subroutine, wwwBlast_sequence, to 
Perl.pm that mimics blast_sequence. I've added support for both 
procedures to pass CGI parameters through to submit_blast. So one can 
do:
	# for QBlast
	use Bio::Perl;
	my $blast_report = blast_sequence($seq, {'-expect' => 1e-6, 
DESCRIPTIONS => 25});
and
	# for wwwBlast
	use Bio::Perl;
	my $blast_report = wwwBlast_sequence($seq, {'-expect' => 1e-6, 
DESCRIPTIONS => 25});

Only the procedure name has to be changed to switch between wwwBlast 
and QBlast. In fact, if a hash of parameters gets set up that includes 
both QBlast and wwwBlast options mixed in together, it doesn't matter 
since each server only looks at the parameters it recognizes and 
ignores all others; as long as the values of the parameters it does 
recognize are set correctly it just works. The only kludge needed for 
this is for ALIGNMENT_VIEW where QBlast expects a string but wwwBlast 
uses numbers to specify the view option. And in both cases, requesting 
a tabular view will cause the blast result parser to fail. I haven't 
bothered catching that particular error since it's the same behaviour 
in both cases.

The interest for me was to use this for prototyping software that will 
eventually hit the NCBI Blast server but without clogging up the NCBI 
queue or wasting my internet connect time while I'm developing. I can 
also imagine it being useful in a center generating its own sequence 
databases and already using wwwBlast.

Since the wwwBlast server doesn't support queues, there's no concept of 
RID and so no need for retrieve_blast in LocalServerBlast; instead, 
submit_blast returns an array of Bio::Tools::BPlite or 
Bio::Tools::Blast objects.

I've also made a slight change to how RemoteBlast.pm checks the return 
status of blast jobs. The HTML returned from the NCBI server contains a 
status line near the top of the file so I just read far enough in the 
response file to pull that information out and then use that, rather 
than the filesize to decide if the job is ready, waiting, or failed.

I've attached the patch files for Perl.pm and RemoteBlast.pm (cvs diff 
-aur against both 1.4 and 1.5) as well as the LocalServerBlast.pm file. 
I'm not sure what the protocol is for "cared for", "copyright" and 
"author" notices is. I've mostly just modified your and Ewan Birney's 
stuff. I'd be happy to care for these modules.

I haven't written any code for the test suite yet but I'll start 
working on that soon. Also, upon further reflection, I decide not to 
incorporate the support for accession# and gi to blast_sequence. If 
anyone wants that, I can put it back in but for a first pass I didn't 
want to change Perl.pm too much.

I've tested these modules with wwwBlast 2.2.9 and 2.2.10 under MacOS X.

All the best,
Madeleine

> Dear Madeleine -
>
> Great.  Would love for someone to be a maintainer and keeper of this 
> module. All your changes sound great.  I think a new function in 
> Bio::Perl would be the best way to allow providing of a new 
> localserver.  Note that Bio::Perl is supposed to really just be a 
> convenience of just having a list of functions for new users - so 
> there is room for new *well named* functions to be added there.
>
> As for applying the changes - you can submit a patch of differences 
> for your new code versus the current CVS HEAD by making changes and 
> then running "cvs diff -aur " to get the changes in a patch format.  
> You'll want to checkout the code via CVS first - 
> http://cvs.open-bio.org/.  We have to give you an authorized account 
> to be able to apply changes back to the repository though.  Once 
> you've submitted a few fixes to show you understand the toolkit and 
> the coding practices we can see about getting you that account.
>
> -jason
> On Nov 24, 2004, at 4:22 AM, Madeleine Lemieux wrote:
>
>> I've just recently started exploring BioPerl (v.1.4). So far it's 
>> been fun if a little daunting.
>>
>> As an exercise, I decided to try change the blast_sequence subroutine 
>> in Perl.pm so that it would let me send the query to either my local 
>> wwwBlast server or out over my slow, flakey internet connection to 
>> the QBlast server. I did this by adding a parameter LOCALSERVER 
>> which, if set to a URL, redirects the query to that server (e.g. 
>> LOCALSERVER => http://localhost/blast/blast.cgi); otherwise, it 
>> defaults to the server at the NCBI.
>>
>> I've also added support for query by accession or gi # (QBlast only 
>> since wwwBlast doesn't support such queries), submission of multiple 
>> sequences (either in a file or string or string variable), as well as 
>> passing any of the QBlast Put and Get options as parameters. Unlike 
>> the original one, my blast_sequence returns an array of results, not 
>> a single result, so that code calling my version of blast_sequence in 
>> a scalar context would incorrectly get the size of the array.
>>
>> Apart from Perl.pm, the only other file that I had to change was 
>> Bio/Tools/Run/RemoteBlast.pm. I just downloaded the latest release 
>> candidate, 1.5.RC1, and noticed that RemoteBlast.pm has been changed 
>> in ways that overlap with the changes I've made while maintaining 
>> backwards compatibility which my version does not since I was only 
>> working for myself at the time.
>>
>> So my question is: is anyone interested in getting the code I've 
>> developed? If so, a corollary question is: how do I go about 
>> contributing the code? I can pretty easily forward port my changes to 
>> RemoteBlast.pm to the 1.5.RC1 version in order to use the nice 
>> "validate by regexp" trick introduced there and to provide backwards 
>> compatibility. I'm not sure what to do about the Perl.pm module, 
>> though. I guess that the easiest would be to change the name of my 
>> blast_sequence subroutine and add it to Perl.pm since there is no 
>> object interface being altered.
>>
>> As I was working on this, I noticed that the HTML stripping that gets 
>> done on the response from the QBlast server fails on wwwBlast output 
>> since the format of the HTML is a little different (manifests as a 
>> "can't find mid-line data" error when processing the alignments). So 
>> I wrote a generic stripper which removes all HTML tags except those 
>> that contain an end-of-line within the tag itself or an internal, 
>> un-escaped closing angle bracket (>) which wouldn't be valid HTML 
>> anyway, I think. It doesn't touch single angle brackets (>) such as 
>> those found at the beginning of descriptions (>gi ...).
>> 	# html stripper
>> 	# remove simple and closing tags first and then leftover tags
>> 	$str =~ s/<(\/)?\w+>//g;
>> 	$str =~ s/<\D+([^>]*\n*)*>//g;
>>
>> Also, when retrieving RIDs in RemoteBlast.pm (retrieve_rid), the test 
>> for completion relies on the size of the file containing the reply. 
>> This has failed at least once for me. Since there is a status line 
>> near the top of the file in the response, it seems to me that 
>> something along the lines of the following might be more robust:
>> 	# read file until QBlastInfoEnd to pull out status
>> 	my $status = '';
>> 	my $junk = '';
>> 	open(TMP, $tempfile) or $self->throw("cannot open $tempfile");
>>      while( defined (my $line = <TMP>) ) {
>>          last if ($line =~ /QBlastInfoEnd/);
>>          ($junk, $status) = (split /=/, $line) if ($line =~ 
>> /waiting|ready/i);
>>      }
>>      close TMP;
>>
>>      if( $response->is_success ) {
>> 		if ( $status =~ /waiting/i ) {
>>              return 0;
>>           } elsif ( $status =~ /ready/i ) {
>> 		    ...
>> 	     } else { # failed
>> 		    ...
>> 		}
>> 	} ...
>>
>> Finally, let me end by thanking all the BioPerl contributors for 
>> their fine work.
>>
>> Regards,
>> Madeleine
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/


-------------- next part --------------
A non-text attachment was scrubbed...
Name: RemoteBlast.pm.diff-1.4
Type: application/octet-stream
Size: 22148 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/RemoteBlast.pm.diff-1-0002.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RemoteBlast.pm.diff-1.5
Type: application/octet-stream
Size: 22150 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/RemoteBlast.pm.diff-1-0003.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Perl.pm.diff-1.4
Type: application/octet-stream
Size: 21885 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/Perl.pm.diff-1-0002.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Perl.pm.diff-1.5
Type: application/octet-stream
Size: 19626 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/Perl.pm.diff-1-0003.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LocalServerBlast.pm
Type: application/octet-stream
Size: 16943 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/LocalServerBlast-0001.obj
-------------- next part --------------





More information about the Bioperl-l mailing list