[Bioperl-l] Easy switching from wwwBlast to QBlast

Fri Nov 26 15:55:09 EST 2004

Dear Madeleine -

Great.  Would love for someone to be a maintainer and keeper of this 
module. All your changes sound great.  I think a new function in 
Bio::Perl would be the best way to allow providing of a new 
localserver.  Note that Bio::Perl is supposed to really just be a 
convenience of just having a list of functions for new users - so there 
is room for new *well named* functions to be added there.

As for applying the changes - you can submit a patch of differences for 
your new code versus the current CVS HEAD by making changes and then 
running "cvs diff -aur " to get the changes in a patch format.  You'll 
want to checkout the code via CVS first - http://cvs.open-bio.org/.  We 
have to give you an authorized account to be able to apply changes back 
to the repository though.  Once you've submitted a few fixes to show 
you understand the toolkit and the coding practices we can see about 
getting you that account.

-jason
On Nov 24, 2004, at 4:22 AM, Madeleine Lemieux wrote:

> I've just recently started exploring BioPerl (v.1.4). So far it's been 
> fun if a little daunting.
>
> As an exercise, I decided to try change the blast_sequence subroutine 
> in Perl.pm so that it would let me send the query to either my local 
> wwwBlast server or out over my slow, flakey internet connection to the 
> QBlast server. I did this by adding a parameter LOCALSERVER which, if 
> set to a URL, redirects the query to that server (e.g. LOCALSERVER => 
> http://localhost/blast/blast.cgi); otherwise, it defaults to the 
> server at the NCBI.
>
> I've also added support for query by accession or gi # (QBlast only 
> since wwwBlast doesn't support such queries), submission of multiple 
> sequences (either in a file or string or string variable), as well as 
> passing any of the QBlast Put and Get options as parameters. Unlike 
> the original one, my blast_sequence returns an array of results, not a 
> single result, so that code calling my version of blast_sequence in a 
> scalar context would incorrectly get the size of the array.
>
> Apart from Perl.pm, the only other file that I had to change was 
> Bio/Tools/Run/RemoteBlast.pm. I just downloaded the latest release 
> candidate, 1.5.RC1, and noticed that RemoteBlast.pm has been changed 
> in ways that overlap with the changes I've made while maintaining 
> backwards compatibility which my version does not since I was only 
> working for myself at the time.
>
> So my question is: is anyone interested in getting the code I've 
> developed? If so, a corollary question is: how do I go about 
> contributing the code? I can pretty easily forward port my changes to 
> RemoteBlast.pm to the 1.5.RC1 version in order to use the nice 
> "validate by regexp" trick introduced there and to provide backwards 
> compatibility. I'm not sure what to do about the Perl.pm module, 
> though. I guess that the easiest would be to change the name of my 
> blast_sequence subroutine and add it to Perl.pm since there is no 
> object interface being altered.
>
> As I was working on this, I noticed that the HTML stripping that gets 
> done on the response from the QBlast server fails on wwwBlast output 
> since the format of the HTML is a little different (manifests as a 
> "can't find mid-line data" error when processing the alignments). So I 
> wrote a generic stripper which removes all HTML tags except those that 
> contain an end-of-line within the tag itself or an internal, 
> un-escaped closing angle bracket (>) which wouldn't be valid HTML 
> anyway, I think. It doesn't touch single angle brackets (>) such as 
> those found at the beginning of descriptions (>gi ...).
> 	# html stripper
> 	# remove simple and closing tags first and then leftover tags
> 	$str =~ s/<(\/)?\w+>//g;
> 	$str =~ s/<\D+([^>]*\n*)*>//g;
>
> Also, when retrieving RIDs in RemoteBlast.pm (retrieve_rid), the test 
> for completion relies on the size of the file containing the reply. 
> This has failed at least once for me. Since there is a status line 
> near the top of the file in the response, it seems to me that 
> something along the lines of the following might be more robust:
> 	# read file until QBlastInfoEnd to pull out status
> 	my $status = '';
> 	my $junk = '';
> 	open(TMP, $tempfile) or $self->throw("cannot open $tempfile");
>      while( defined (my $line = <TMP>) ) {
>          last if ($line =~ /QBlastInfoEnd/);
>          ($junk, $status) = (split /=/, $line) if ($line =~ 
> /waiting|ready/i);
>      }
>      close TMP;
>
>      if( $response->is_success ) {
> 		if ( $status =~ /waiting/i ) {
>              return 0;
>           } elsif ( $status =~ /ready/i ) {
> 		    ...
> 	     } else { # failed
> 		    ...
> 		}
> 	} ...
>
> Finally, let me end by thanking all the BioPerl contributors for their 
> fine work.
>
> Regards,
> Madeleine
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/