[Bioperl-l] Easy switching from wwwBlast to QBlast
Jason Stajich
jason.stajich at duke.edu
Fri Nov 26 15:55:09 EST 2004
Dear Madeleine -
Great. Would love for someone to be a maintainer and keeper of this
module. All your changes sound great. I think a new function in
Bio::Perl would be the best way to allow providing of a new
localserver. Note that Bio::Perl is supposed to really just be a
convenience of just having a list of functions for new users - so there
is room for new *well named* functions to be added there.
As for applying the changes - you can submit a patch of differences for
your new code versus the current CVS HEAD by making changes and then
running "cvs diff -aur " to get the changes in a patch format. You'll
want to checkout the code via CVS first - http://cvs.open-bio.org/. We
have to give you an authorized account to be able to apply changes back
to the repository though. Once you've submitted a few fixes to show
you understand the toolkit and the coding practices we can see about
getting you that account.
-jason
On Nov 24, 2004, at 4:22 AM, Madeleine Lemieux wrote:
> I've just recently started exploring BioPerl (v.1.4). So far it's been
> fun if a little daunting.
>
> As an exercise, I decided to try change the blast_sequence subroutine
> in Perl.pm so that it would let me send the query to either my local
> wwwBlast server or out over my slow, flakey internet connection to the
> QBlast server. I did this by adding a parameter LOCALSERVER which, if
> set to a URL, redirects the query to that server (e.g. LOCALSERVER =>
> http://localhost/blast/blast.cgi); otherwise, it defaults to the
> server at the NCBI.
>
> I've also added support for query by accession or gi # (QBlast only
> since wwwBlast doesn't support such queries), submission of multiple
> sequences (either in a file or string or string variable), as well as
> passing any of the QBlast Put and Get options as parameters. Unlike
> the original one, my blast_sequence returns an array of results, not a
> single result, so that code calling my version of blast_sequence in a
> scalar context would incorrectly get the size of the array.
>
> Apart from Perl.pm, the only other file that I had to change was
> Bio/Tools/Run/RemoteBlast.pm. I just downloaded the latest release
> candidate, 1.5.RC1, and noticed that RemoteBlast.pm has been changed
> in ways that overlap with the changes I've made while maintaining
> backwards compatibility which my version does not since I was only
> working for myself at the time.
>
> So my question is: is anyone interested in getting the code I've
> developed? If so, a corollary question is: how do I go about
> contributing the code? I can pretty easily forward port my changes to
> RemoteBlast.pm to the 1.5.RC1 version in order to use the nice
> "validate by regexp" trick introduced there and to provide backwards
> compatibility. I'm not sure what to do about the Perl.pm module,
> though. I guess that the easiest would be to change the name of my
> blast_sequence subroutine and add it to Perl.pm since there is no
> object interface being altered.
>
> As I was working on this, I noticed that the HTML stripping that gets
> done on the response from the QBlast server fails on wwwBlast output
> since the format of the HTML is a little different (manifests as a
> "can't find mid-line data" error when processing the alignments). So I
> wrote a generic stripper which removes all HTML tags except those that
> contain an end-of-line within the tag itself or an internal,
> un-escaped closing angle bracket (>) which wouldn't be valid HTML
> anyway, I think. It doesn't touch single angle brackets (>) such as
> those found at the beginning of descriptions (>gi ...).
> # html stripper
> # remove simple and closing tags first and then leftover tags
> $str =~ s/<(\/)?\w+>//g;
> $str =~ s/<\D+([^>]*\n*)*>//g;
>
> Also, when retrieving RIDs in RemoteBlast.pm (retrieve_rid), the test
> for completion relies on the size of the file containing the reply.
> This has failed at least once for me. Since there is a status line
> near the top of the file in the response, it seems to me that
> something along the lines of the following might be more robust:
> # read file until QBlastInfoEnd to pull out status
> my $status = '';
> my $junk = '';
> open(TMP, $tempfile) or $self->throw("cannot open $tempfile");
> while( defined (my $line = <TMP>) ) {
> last if ($line =~ /QBlastInfoEnd/);
> ($junk, $status) = (split /=/, $line) if ($line =~
> /waiting|ready/i);
> }
> close TMP;
>
> if( $response->is_success ) {
> if ( $status =~ /waiting/i ) {
> return 0;
> } elsif ( $status =~ /ready/i ) {
> ...
> } else { # failed
> ...
> }
> } ...
>
> Finally, let me end by thanking all the BioPerl contributors for their
> fine work.
>
> Regards,
> Madeleine
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
More information about the Bioperl-l
mailing list