[Bioperl-l] Blast parsing exception
Marc Logghe
MarcL@DEVGEN.com
Fri, 19 Oct 2001 16:27:28 +0200
Found the bug. When I commented out completely this code snippet from
Bio::Tools::Blast
# Incyte_Fix: Nasty Invisible Bug.
# Records in blast report are delimited by '>', but... when
# there are no hits for a query, there won't be a '>'. That
# causes several blast reports to run together in the data
# passed to this routine. Need to get rid of non-hits in
data
if ($data =~ /.+(No hits? found.+)/so) {
$data = $1;
}
# End Incyte_Fix
then exception was gone. Of course the original problem for which the
Incyte_fix was intended is unfixed again. Working on that.
Marc
> -----Original Message-----
> From: Marc Logghe [mailto:MarcL@devgen.com]
> Sent: Friday, October 19, 2001 8:55 AM
> To: 'bioperl-l@bioperl.org'
> Subject: [Bioperl-l] Blast parsing exception
>
>
> Hi all,
> When parsing a multiple blast output file I get these error
> message as soon
> as it reaches the 'No hits' - blast report:
> -------------------- WARNING ---------------------
> MSG: Can't determine query sequence name from BLAST report.
> ---------------------------------------------------
> -------------------- EXCEPTION --------------------
> MSG: Unexpected error during read: -------------------- EXCEPTION
> --------------------
> MSG: Can't determine sequence length from BLAST report.
> STACK Bio::Tools::Blast::_set_length
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:2550
> STACK Bio::Tools::Blast::_parse_header
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1962
> STACK Bio::Tools::Blast::__ANON__
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1761
> STACK (eval) /usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:752
> STACK Bio::Root::IOManager::read
> /usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:736
> STACK Bio::Root::Object::read
> /usr/lib/perl5/site_perl/5.005/Bio/Root/Object.pm:1511
> STACK Bio::Tools::Blast::_parse_blast_stream
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1615
> STACK Bio::Tools::Blast::parse
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1465
> STACK toplevel ./get_alias.pl:9
> -------------------------------------------
>
> STACK Bio::Root::IOManager::read
> /usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:763
> STACK Bio::Root::Object::read
> /usr/lib/perl5/site_perl/5.005/Bio/Root/Object.pm:1511
> STACK Bio::Tools::Blast::_parse_blast_stream
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1615
> STACK Bio::Tools::Blast::parse
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1465
> STACK toplevel ./get_alias.pl:9
> -------------------------------------------
> When you remove this 'no hits' report, it works fine.
> Can somebody help me out with this, I am not able to pinpoint
> the problem.
> Thanks.
> I took the relevant part of the blast results out and passed
> it to STDIN to
> be able to reproduce the exception easily.
>
> #!/usr/local/bin/perl -w
>
> use strict;
> use Bio::Tools::Blast qw(:obj);
>
> *STDIN = *DATA;
>
> $Blast->parse
> (
> # -file => '../wp18_wp63.res',
> # -file => '../test.res',
> -parse => 1,
> -exec_func => \&process_blast,
> );
>
> sub process_blast
> {
> my $blastObj = shift;
> my $hit = $blastObj->hit;
> my $qname = $blastObj->query;
> if ($hit)
> {
> my $hitname = $hit->name;
> printf("%s\t%s\t%s\n", $qname,$hitname,$hit->expect) if ($qname ne
> $hitname);
> }
> else
> {
> print STDERR "$qname\n";
> }
> $blastObj->destroy;
> }
>
> __DATA__
> BLASTP 2.2.1 [Jul-12-2001]
>
>
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro
> A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein
> database search
> programs", Nucleic Acids Res. 25:3389-3402.
>
> Query= ZK994.5 CE15491 (ST.LOUIS) TR:O44085 protein_id:AAB88613.1
> (339 letters)
>
> Database: wp63
> 20,100 sequences; 8,819,854 total letters
>
> Searching.........................................done
>
>
> Score
> E
> Sequences producing significant alignments:
> (bits)
> Value
>
> T24H5.1 CE26008 (ST.LOUIS) protein_id:AAK84578.1
> 620
> e-178
> T23B12.9 CE14042 (ST.LOUIS) TR:O17008
> protein_id:AAB69941.1 616
> e-177
> F07C7.1 CE07032 (ST.LOUIS) TR:Q19161 protein_id:AAA85753.1
> 463
> e-131
>
> >T24H5.1 CE26008 (ST.LOUIS) protein_id:AAK84578.1
> Length = 414
>
> Score = 620 bits (1598), Expect = e-178
> Identities = 300/339 (88%), Positives = 300/339 (88%)
>
> Query: 1
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIE
> HLEKFES
> Sbjct: 76
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIETPRTPADSPTTSTTETRTRAHLEKFES 135
>
> Query: 61
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
>
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ
> Sbjct: 136
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 195
>
> Query: 121
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTG
>
> Sbjct: 196
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGPDNDTPANDTNNDTDKDTA 255
>
> Query: 181
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
>
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
> Sbjct: 256
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 315
>
> Query: 241
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
>
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
> Sbjct: 316
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 375
>
> Query: 301 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
> APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF
> Sbjct: 376 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 414
>
>
> >T23B12.9 CE14042 (ST.LOUIS) TR:O17008 protein_id:AAB69941.1
> Length = 1744
>
> Score = 616 bits (1589), Expect = e-177
> Identities = 299/339 (88%), Positives = 299/339 (88%)
>
> Query: 1
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIE
> HLEKFES
> Sbjct: 1406
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIETPRTPADSPTTSTTETRTRAHLEKFES
> 1465
>
> Query: 61
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPN
> VSRHRWPLALVVQVNQ
> Sbjct: 1466
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNNVSRHRWPLALVVQVNQ
> 1525
>
> Query: 121
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTG
>
> Sbjct: 1526
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGPDNDTPANDTNNDTDKDTA
> 1585
>
> Query: 181
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
>
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
> Sbjct: 1586
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
> 1645
>
> Query: 241
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
>
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
> Sbjct: 1646
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
> 1705
>
> Query: 301 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
> APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF
> Sbjct: 1706 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 1744
>
>
> >F07C7.1 CE07032 (ST.LOUIS) TR:Q19161 protein_id:AAA85753.1
> Length = 1879
>
> Score = 463 bits (1192), Expect = e-131
> Identities = 239/339 (70%), Positives = 248/339 (72%), Gaps
> = 31/339 (9%)
>
> Query: 1
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
> MINNRPLVAHARSPNDMI LRPMDFMIPGVMIE
> HLEKFES
> Sbjct: 1572
> MINNRPLVAHARSPNDMIALRPMDFMIPGVMIETPRTPADSPTTSTTEIRTRAHLEKFES
> 1631
>
> Query: 61
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
> ALERLWTIWTFGVMLILREVSHKHKRCCD KPEVGDVVIIN
> NYVSRHRWPLALVVQVNQ
> Sbjct: 1632
> ALERLWTIWTFGVMLILREVSHKHKRCCDPKPEVGDVVIINTNYVSRHRWPLALVVQVNQ
> 1691
>
> Query: 121
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
> SKRDGEIRTAV LIPLETSRQ+IRHGTG
>
> Sbjct: 1692
> SKRDGEIRTAV--------------LIPLETSRQDIRHGTGPDNDTPANDTNNDTDKDTA
> 1737
>
> Query: 181
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
> GSDQCR CPTLPTPALLDFENSH A+ + P ++
> EI
> Sbjct: 1738
> GSDQCRPCPTLPTPALLDFENSHFARRSQP-----------------KFSRTSVKNLEIS
> 1780
>
> Query: 241
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
> ++ GPDYDTNNPLF EDGE EDRPVEYVDP TAIPEIAYD+AETRLP
> GRTREYLGRKAK
> Sbjct: 1781
> LWIGPDYDTNNPLFHEDGEAEDRPVEYVDPITAIPEIAYDNAETRLPQGRTREYLGRKAK
> 1840
>
> Query: 301 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
> APYINYNHAEITRVLS+PSPPECCRFPVIPQESLNLKDF
> Sbjct: 1841 APYINYNHAEITRVLSDPSPPECCRFPVIPQESLNLKDF 1879
>
>
> Database: wp63
> Posted date: Sep 14, 2001 11:28 AM
> Number of letters in database: 8,819,854
> Number of sequences in database: 20,100
>
> Lambda K H
> 0.320 0.139 0.435
>
> Gapped
> Lambda K H
> 0.267 0.0410 0.140
>
>
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Hits to DB: 5,848,008
> Number of Sequences: 20100
> Number of extensions: 236898
> Number of successful extensions: 507
> Number of sequences better than 1.0e-100: 3
> Number of HSP's better than 0.0 without gapping: 2
> Number of HSP's successfully gapped in prelim test: 1
> Number of HSP's that attempted gapping in prelim test: 498
> Number of HSP's gapped (non-prelim): 3
> length of query: 339
> length of database: 8,819,854
> effective HSP length: 98
> effective length of query: 241
> effective length of database: 6,850,054
> effective search space: 1650863014
> effective search space used: 1650863014
> T: 11
> A: 40
> X1: 16 ( 7.4 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 41 (21.8 bits)
> S2: 930 (362.8 bits)
> BLASTP 2.2.1 [Jul-12-2001]
>
>
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro
> A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein
> database search
> programs", Nucleic Acids Res. 25:3389-3402.
>
> Query= ZK994.6 CE15493 (ST.LOUIS) TR:O44088 protein_id:AAB88612.1
> (113 letters)
>
> Database: wp63
> 20,100 sequences; 8,819,854 total letters
>
> Searching.........................................done
>
> ***** No hits found ******
>
> Database: wp63
> Posted date: Sep 14, 2001 11:28 AM
> Number of letters in database: 8,819,854
> Number of sequences in database: 20,100
>
> Lambda K H
> 0.316 0.129 0.372
>
> Gapped
> Lambda K H
> 0.267 0.0410 0.140
>
>
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Hits to DB: 1,625,059
> Number of Sequences: 20100
> Number of extensions: 53280
> Number of successful extensions: 140
> Number of sequences better than 1.0e-100: 0
> Number of HSP's better than 0.0 without gapping: 0
> Number of HSP's successfully gapped in prelim test: 0
> Number of HSP's that attempted gapping in prelim test: 140
> Number of HSP's gapped (non-prelim): 0
> length of query: 113
> length of database: 8,819,854
> effective HSP length: 89
> effective length of query: 24
> effective length of database: 7,030,954
> effective search space: 168742896
> effective search space used: 168742896
> T: 11
> A: 40
> X1: 16 ( 7.3 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 41 (21.6 bits)
> S2: 922 (359.8 bits)
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>