[Bioperl-l] Blast parsing exception
Marc Logghe
MarcL@DEVGEN.com
Fri, 19 Oct 2001 08:54:48 +0200
Hi all,
When parsing a multiple blast output file I get these error message as soon
as it reaches the 'No hits' - blast report:
-------------------- WARNING ---------------------
MSG: Can't determine query sequence name from BLAST report.
---------------------------------------------------
-------------------- EXCEPTION --------------------
MSG: Unexpected error during read: -------------------- EXCEPTION
--------------------
MSG: Can't determine sequence length from BLAST report.
STACK Bio::Tools::Blast::_set_length
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:2550
STACK Bio::Tools::Blast::_parse_header
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1962
STACK Bio::Tools::Blast::__ANON__
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1761
STACK (eval) /usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:752
STACK Bio::Root::IOManager::read
/usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:736
STACK Bio::Root::Object::read
/usr/lib/perl5/site_perl/5.005/Bio/Root/Object.pm:1511
STACK Bio::Tools::Blast::_parse_blast_stream
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1615
STACK Bio::Tools::Blast::parse
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1465
STACK toplevel ./get_alias.pl:9
-------------------------------------------
STACK Bio::Root::IOManager::read
/usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:763
STACK Bio::Root::Object::read
/usr/lib/perl5/site_perl/5.005/Bio/Root/Object.pm:1511
STACK Bio::Tools::Blast::_parse_blast_stream
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1615
STACK Bio::Tools::Blast::parse
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1465
STACK toplevel ./get_alias.pl:9
-------------------------------------------
When you remove this 'no hits' report, it works fine.
Can somebody help me out with this, I am not able to pinpoint the problem.
Thanks.
I took the relevant part of the blast results out and passed it to STDIN to
be able to reproduce the exception easily.
#!/usr/local/bin/perl -w
use strict;
use Bio::Tools::Blast qw(:obj);
*STDIN = *DATA;
$Blast->parse
(
# -file => '../wp18_wp63.res',
# -file => '../test.res',
-parse => 1,
-exec_func => \&process_blast,
);
sub process_blast
{
my $blastObj = shift;
my $hit = $blastObj->hit;
my $qname = $blastObj->query;
if ($hit)
{
my $hitname = $hit->name;
printf("%s\t%s\t%s\n", $qname,$hitname,$hit->expect) if ($qname ne
$hitname);
}
else
{
print STDERR "$qname\n";
}
$blastObj->destroy;
}
__DATA__
BLASTP 2.2.1 [Jul-12-2001]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= ZK994.5 CE15491 (ST.LOUIS) TR:O44085 protein_id:AAB88613.1
(339 letters)
Database: wp63
20,100 sequences; 8,819,854 total letters
Searching.........................................done
Score
E
Sequences producing significant alignments: (bits)
Value
T24H5.1 CE26008 (ST.LOUIS) protein_id:AAK84578.1 620
e-178
T23B12.9 CE14042 (ST.LOUIS) TR:O17008 protein_id:AAB69941.1 616
e-177
F07C7.1 CE07032 (ST.LOUIS) TR:Q19161 protein_id:AAA85753.1 463
e-131
>T24H5.1 CE26008 (ST.LOUIS) protein_id:AAK84578.1
Length = 414
Score = 620 bits (1598), Expect = e-178
Identities = 300/339 (88%), Positives = 300/339 (88%)
Query: 1 MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
MINNRPLVAHARSPNDMITLRPMDFMIPGVMIE HLEKFES
Sbjct: 76 MINNRPLVAHARSPNDMITLRPMDFMIPGVMIETPRTPADSPTTSTTETRTRAHLEKFES 135
Query: 61 ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ
Sbjct: 136 ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 195
Query: 121 SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTG
Sbjct: 196 SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGPDNDTPANDTNNDTDKDTA 255
Query: 181 GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
Sbjct: 256 GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 315
Query: 241 IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
Sbjct: 316 IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 375
Query: 301 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF
Sbjct: 376 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 414
>T23B12.9 CE14042 (ST.LOUIS) TR:O17008 protein_id:AAB69941.1
Length = 1744
Score = 616 bits (1589), Expect = e-177
Identities = 299/339 (88%), Positives = 299/339 (88%)
Query: 1 MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
MINNRPLVAHARSPNDMITLRPMDFMIPGVMIE HLEKFES
Sbjct: 1406 MINNRPLVAHARSPNDMITLRPMDFMIPGVMIETPRTPADSPTTSTTETRTRAHLEKFES
1465
Query: 61 ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPN VSRHRWPLALVVQVNQ
Sbjct: 1466 ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNNVSRHRWPLALVVQVNQ
1525
Query: 121 SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTG
Sbjct: 1526 SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGPDNDTPANDTNNDTDKDTA
1585
Query: 181 GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
Sbjct: 1586 GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
1645
Query: 241 IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
Sbjct: 1646 IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
1705
Query: 301 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF
Sbjct: 1706 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 1744
>F07C7.1 CE07032 (ST.LOUIS) TR:Q19161 protein_id:AAA85753.1
Length = 1879
Score = 463 bits (1192), Expect = e-131
Identities = 239/339 (70%), Positives = 248/339 (72%), Gaps = 31/339 (9%)
Query: 1 MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
MINNRPLVAHARSPNDMI LRPMDFMIPGVMIE HLEKFES
Sbjct: 1572 MINNRPLVAHARSPNDMIALRPMDFMIPGVMIETPRTPADSPTTSTTEIRTRAHLEKFES
1631
Query: 61 ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
ALERLWTIWTFGVMLILREVSHKHKRCCD KPEVGDVVIIN NYVSRHRWPLALVVQVNQ
Sbjct: 1632 ALERLWTIWTFGVMLILREVSHKHKRCCDPKPEVGDVVIINTNYVSRHRWPLALVVQVNQ
1691
Query: 121 SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
SKRDGEIRTAV LIPLETSRQ+IRHGTG
Sbjct: 1692 SKRDGEIRTAV--------------LIPLETSRQDIRHGTGPDNDTPANDTNNDTDKDTA
1737
Query: 181 GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
GSDQCR CPTLPTPALLDFENSH A+ + P ++ EI
Sbjct: 1738 GSDQCRPCPTLPTPALLDFENSHFARRSQP-----------------KFSRTSVKNLEIS
1780
Query: 241 IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
++ GPDYDTNNPLF EDGE EDRPVEYVDP TAIPEIAYD+AETRLP GRTREYLGRKAK
Sbjct: 1781 LWIGPDYDTNNPLFHEDGEAEDRPVEYVDPITAIPEIAYDNAETRLPQGRTREYLGRKAK
1840
Query: 301 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
APYINYNHAEITRVLS+PSPPECCRFPVIPQESLNLKDF
Sbjct: 1841 APYINYNHAEITRVLSDPSPPECCRFPVIPQESLNLKDF 1879
Database: wp63
Posted date: Sep 14, 2001 11:28 AM
Number of letters in database: 8,819,854
Number of sequences in database: 20,100
Lambda K H
0.320 0.139 0.435
Gapped
Lambda K H
0.267 0.0410 0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 5,848,008
Number of Sequences: 20100
Number of extensions: 236898
Number of successful extensions: 507
Number of sequences better than 1.0e-100: 3
Number of HSP's better than 0.0 without gapping: 2
Number of HSP's successfully gapped in prelim test: 1
Number of HSP's that attempted gapping in prelim test: 498
Number of HSP's gapped (non-prelim): 3
length of query: 339
length of database: 8,819,854
effective HSP length: 98
effective length of query: 241
effective length of database: 6,850,054
effective search space: 1650863014
effective search space used: 1650863014
T: 11
A: 40
X1: 16 ( 7.4 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.8 bits)
S2: 930 (362.8 bits)
BLASTP 2.2.1 [Jul-12-2001]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= ZK994.6 CE15493 (ST.LOUIS) TR:O44088 protein_id:AAB88612.1
(113 letters)
Database: wp63
20,100 sequences; 8,819,854 total letters
Searching.........................................done
***** No hits found ******
Database: wp63
Posted date: Sep 14, 2001 11:28 AM
Number of letters in database: 8,819,854
Number of sequences in database: 20,100
Lambda K H
0.316 0.129 0.372
Gapped
Lambda K H
0.267 0.0410 0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 1,625,059
Number of Sequences: 20100
Number of extensions: 53280
Number of successful extensions: 140
Number of sequences better than 1.0e-100: 0
Number of HSP's better than 0.0 without gapping: 0
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 140
Number of HSP's gapped (non-prelim): 0
length of query: 113
length of database: 8,819,854
effective HSP length: 89
effective length of query: 24
effective length of database: 7,030,954
effective search space: 168742896
effective search space used: 168742896
T: 11
A: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.6 bits)
S2: 922 (359.8 bits)