[Bioperl-l] Easy switching from wwwBlast to QBlast
Madeleine Lemieux
mlemieux at bioinfo.ca
Wed Nov 24 04:22:26 EST 2004
I've just recently started exploring BioPerl (v.1.4). So far it's been
fun if a little daunting.
As an exercise, I decided to try change the blast_sequence subroutine
in Perl.pm so that it would let me send the query to either my local
wwwBlast server or out over my slow, flakey internet connection to the
QBlast server. I did this by adding a parameter LOCALSERVER which, if
set to a URL, redirects the query to that server (e.g. LOCALSERVER =>
http://localhost/blast/blast.cgi); otherwise, it defaults to the server
at the NCBI.
I've also added support for query by accession or gi # (QBlast only
since wwwBlast doesn't support such queries), submission of multiple
sequences (either in a file or string or string variable), as well as
passing any of the QBlast Put and Get options as parameters. Unlike the
original one, my blast_sequence returns an array of results, not a
single result, so that code calling my version of blast_sequence in a
scalar context would incorrectly get the size of the array.
Apart from Perl.pm, the only other file that I had to change was
Bio/Tools/Run/RemoteBlast.pm. I just downloaded the latest release
candidate, 1.5.RC1, and noticed that RemoteBlast.pm has been changed in
ways that overlap with the changes I've made while maintaining
backwards compatibility which my version does not since I was only
working for myself at the time.
So my question is: is anyone interested in getting the code I've
developed? If so, a corollary question is: how do I go about
contributing the code? I can pretty easily forward port my changes to
RemoteBlast.pm to the 1.5.RC1 version in order to use the nice
"validate by regexp" trick introduced there and to provide backwards
compatibility. I'm not sure what to do about the Perl.pm module,
though. I guess that the easiest would be to change the name of my
blast_sequence subroutine and add it to Perl.pm since there is no
object interface being altered.
As I was working on this, I noticed that the HTML stripping that gets
done on the response from the QBlast server fails on wwwBlast output
since the format of the HTML is a little different (manifests as a
"can't find mid-line data" error when processing the alignments). So I
wrote a generic stripper which removes all HTML tags except those that
contain an end-of-line within the tag itself or an internal, un-escaped
closing angle bracket (>) which wouldn't be valid HTML anyway, I think.
It doesn't touch single angle brackets (>) such as those found at the
beginning of descriptions (>gi ...).
# html stripper
# remove simple and closing tags first and then leftover tags
$str =~ s/<(\/)?\w+>//g;
$str =~ s/<\D+([^>]*\n*)*>//g;
Also, when retrieving RIDs in RemoteBlast.pm (retrieve_rid), the test
for completion relies on the size of the file containing the reply.
This has failed at least once for me. Since there is a status line near
the top of the file in the response, it seems to me that something
along the lines of the following might be more robust:
# read file until QBlastInfoEnd to pull out status
my $status = '';
my $junk = '';
open(TMP, $tempfile) or $self->throw("cannot open $tempfile");
while( defined (my $line = <TMP>) ) {
last if ($line =~ /QBlastInfoEnd/);
($junk, $status) = (split /=/, $line) if ($line =~
/waiting|ready/i);
}
close TMP;
if( $response->is_success ) {
if ( $status =~ /waiting/i ) {
return 0;
} elsif ( $status =~ /ready/i ) {
...
} else { # failed
...
}
} ...
Finally, let me end by thanking all the BioPerl contributors for their
fine work.
Regards,
Madeleine
More information about the Bioperl-l
mailing list