[Bioperl-l] Remote Blast Failing
paul.boutros at utoronto.ca
paul.boutros at utoronto.ca
Fri Sep 9 00:46:50 EDT 2005
Hello,
NCBI has changed their format for RemoteBlasts, and in some cases this is
causing SearchIO to fail. I think this is related to Jason's email from a few
weeks back:
http://bioperl.org/pipermail/bioperl-l/2005-August/019634.html
All nucleotide queries I tried fail on perl 5.8.7 on both AIX and WinXP using
Bioperl 1.4 (last stable release). The reason appears to be a change in the HSP
alignment format, removing a comma. A work-around for BioPerl 1.4 is to change
line 1145 of Bio\SearchIO\blast.pm this way:
-1145: if( /^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/ ) {
+1145: if( /^((Query|Sbjct):{0,1}\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/ ) {
I downloaded the CVS tarball, and this change is already in bioperl-live.
However, one class of queries that *doesn't* work from bioperl-live are genomic
BLASTs. Here, NCBI has added several extra lines have been added to the output.
Here's an example of the new format:
#################################################
gi|63489990|ref|NT_039206.4|Mm2_39246_34 Mus musculus chromosome 28.2 25
gi|63482841|ref|NT_078297.3|Mm1_78362_34 Mus musculus chromosome 28.2 25
ALIGNMENTS
>gi|63543231|ref|NT_039343.4|Mm6_39383_34 Mus musculus chromosome 6 genomic
contig, strain C57BL/6J
Length=21478308
Features flanking this part of subject sequence:
60669 bp at 5' side: hypothetical protein LOC101197
386242 bp at 3' side: RIKEN cDNA A930040G15
Score = 38.2 bits (19), Expect = 0.026
Identities = 19/19 (100%), Gaps = 0/19 (0%)
Strand=Plus/Plus
Query 1 AGGCCGTTCACCAGTATGA 19
|||||||||||||||||||
Sbjct 246489 AGGCCGTTCACCAGTATGA 246507
#################################################
And parsing a report containing this gives the error message:
#################################################
------------- EXCEPTION -------------
MSG: no data for midline Features flanking this part of subject sequence:
STACK Bio::SearchIO::blast::next_result C:/Perl/site/lib/Bio\SearchIO\blast.pm:
1173
STACK toplevel test_blast.pl:9
--------------------------------------
#################################################
I can submit a patch, but I wanted to get input on the best way to handle this:
should the feature-data be stored somewhere, or just skipped?
Here are the parameters used for this query in case somebody wants to recreate
it. I can also forward the blast report file if you're interested.
Sequence: aggccgttcaccagtatgac
Database: mouse_contig/ref_contig
Entrez Query: Mus musculus [ORGN]
A short-term fix if anybody else is having this problem is to BLAST against the
database 'chromosome' instead of 'mouse_contig/ref_contif' and so forth for
other species.
Sorry for the long message!
Paul
More information about the Bioperl-l
mailing list